A PDP-11 target for lcc

I have made this commentary as a straightforward case study which I hope might assist anyone else who is trying to retarget lcc. I also hope that experts are willing to enlarge on its deficiences and thereby improve my understanding of lcc (and even the PDP-11 :) Please be free with constructive criticism.

Also, if anyone has suggestions to better explain the process of retargeting, please let me know. Perhaps this can be the basis for a tutorial on that subject.

I believe there already exists (somewhere) a PDP-11 back-end, which I have not seen. Knowing this didn't deter me from going through this self-educational exercise, however. It was also an opportunity to complete my knowledge of the One True Instruction Set. If the other implementation surfaces, it may provide an interesting comparison.

I found producing the back-end challenging and at times very frustrating. This is not to say lcc is at fault - far from it; it's extremely well put together, but despite the many assertions already in lcc, a machine description can be difficult to debug.

To forestall inevitable criticisms of the generated code: I am aware of very many imperfections, some no doubt due to bugs (or omissions) in the machine description, but others are certainly due to the broad approach taken by lcc. Targeting smaller architectures also rubs against some of its design assumptions.

Code sample

The following small function exploits several PDP-11-specific idioms.
unsigned countbits(unsigned v){ 
    register unsigned t=v,n;
    for( n=0 ; t ; t>>=1 )
        if(t&1)
            ++n;
    return n;
}

/* for better methods, see 
http://www.caam.rice.edu/~dougm/twiddle/BitCount.html */
 
countbits:
mov fp,-(sp) ; save frame pointer
mov sp,fp
mov r2,-(sp) ; save
mov r3,-(sp) ; save
;
mov 4(fp),r3 ; reg <- opd
clr r2
br 5$
2$:
bit  #1,r3
beq 6$
inc r2
6$:
3$:
clc
ror r3
5$:
tst r3
bne 2$
mov r2,r0 ; LOADU2
1$:
;
mov (sp)+,r3 ; restore
mov (sp)+,r2 ; restore
mov (sp)+,fp ; restore frame pointer
rts pc
(For assembly listing, see below.)

However, within these parameters, I think the generated code is not too bad (I would like to compare in detail to other PDP compilers such as DECUS C or pcc). Where there are exceptions I've tried to explain how they arise. Note that the instruction templates below include many assembler comments which I've found useful during debugging. Since this is a beta release, it would be premature to remove them.

Limitations

For instance, I believe it is difficult to make use of the PDP's auto-increment and auto-decrement modes, because lcc must represent the increment and dereference as separate trees, precluding templates. In any case, these PDP addressing modes apply only to operations on pointers. (A peephole pass after tree covering might be able to retrofit increments in obvious situations. Certainly a peephole pass would remove some redundant moves that lcc wants to emit.)

lcc's disdain for register starved architectures has so far proved a showstopper to the implementation of long arithmetic.

Since I have decided to use a permanent frame pointer (R5), five registers 0-4 remain. For char and short operations, this can mean two variables and three temporaries. However, for long operations, this allows only two register temporaries (it would be foolish to try to allocate long variables to registers) - and lcc simply hangs under such tight constraints.

It's not possible to write working templates for some byte operations, such as INCB/DECB; lcc's automatic insertion of LOAD operators breaks the match [e.g. ASGNI1(mem,ADDI2(CVII2(opd),con1)) ]

Bugs

  • Casting a constant to void* causes a fatal error, perhaps related to missing long support.
  • sizeof causes a fatal error, related to missing long support.

    The machine description

    The machine description, pdp11.md is input to lburg. Like conventional parser descriptions it begins with a C passthrough section bracketed by %{ and }%. This is standard boilerplate. Let there be no confusion about the license:
    %{
    
    /*
        This file is part of lcc-pdp11.md, an lcc machine description for PDP-11.
        Copyright (C) 2004 Toby Thain, toby@telegraphics.com.au
    
        This program is free software; you can redistribute it and/or modify
        it under the terms of the GNU General Public License as published by  
        the Free Software Foundation; either version 2 of the License, or
        (at your option) any later version.
    
        This program is distributed in the hope that it will be useful,
        but WITHOUT ANY WARRANTY; without even the implied warranty of
        MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
        GNU General Public License for more details.
    
        You should have received a copy of the GNU General Public License  
        along with this program; if not, write to the Free Software
        Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
        
        0.1b1, 16 Nov 2004
        0.2b1, 16 Nov 2004 - fix costs on CNST, EQ(BAND(),0)
        0.3b1, 16 Nov 2004 - fix local(), detect framesize==2 in function()
        0.4b1, 16 Nov 2004 - added rules for the BCOMU2(ADDU2(x,1)), lcc's "negate unsigned";
                             put nonzero costs on INC/DEC rules
        0.5b2, 18 Nov 2004 - shift-by-8 optimisations
        0.6b1, 20 Nov 2004 - fix ASH generation bugs, register saving etc
        0.6b2, 20 Nov 2004 - strip out some "long" rules
        0.7b1,  5 Dec 2004 - delete costs for base address rules.
                             also fix statics in function scope.
        0.7b2,  5 Dec 2004 - fix JMP/BR rules
        0.7b3,  5 Dec 2004 - improve handling of pointer constants (addresses of globals)
        0.7b4,  5 Dec 2004 - put all CALL variants in rules (don't use emit2)
    */
    
    #include <stdlib.h>
    #include "c.h"
    

    Next some constants that define the planned register layout. The PDP-11 has eight general purpose registers. R7 and R6 are the program counter (PC) and system stack pointer (SP) respectively, which leaves us with six, R0-R5. I've decided to use R5 as a frame pointer, leaving five registers for variables and temporaries.

    #define INTTMP 0x13             /* temps R4,R0/1 - three short or one long */
    #define INTVAR (0x1f ^ INTTMP)  /* rvars R2/3 - two short or one long */
    #define FLTTMP 0x31             /* float temps */
    #define FLTVAR (0x3f ^ FLTTMP)  /* the rest float vars */
    

    The following macros and function declarations are standard boilerplate.

    #define NODEPTR_TYPE Node
    #define OP_LABEL(p) ((p)->op)
    #define LEFT_CHILD(p) ((p)->kids[0])
    #define RIGHT_CHILD(p) ((p)->kids[1])
    #define STATE_LABEL(p) ((p)->x.state)
    
    static void address(Symbol, Symbol, long);
    ... some prototypes omitted ...
    static void target(Node);
    

    Declarations of "cost functions" used by the templates in this file:

    static int memop(Node);
    static int cnstrhs(Node);
    static int sametree(Node,Node);
    

    Now variables representing the machine registers. Although there are fewer than 32 registers, I superstitiously leave these arrays with 32 elements; lcc uses many 32-bit vectors to describe register sets. The divd register is explained below in progbeg(). The "wildcards" iregw,lregw,fregw define allocable register sets for short, long and floating point variables respectively.

    static Symbol ireg[32],lreg[32],freg[32],divd,iregw,lregw,fregw;
    
    %}
    

    Next come lburg's terminal definitions. These are generated by the utility ops (in etc directory). Each term encodes an operator, type (I,U,P,V,F,B), and size (1,2,4,8). These sizes are passed to ops; for this target I have chosen

  • char=1
  • short=2
  • int=2
  • long=4
  • float=4
  • double=8
  • pointer=2.

    Note that long, float and double operations are not yet included in this machine description.

    %start stmt
    
    %term CNSTF4=4113 CNSTF8=8209
    ... many definitions omitted ...
    %term VREGP=711
    

    In the next section are described our rule patterns and their corresponding templates. lcc's code generator uses these patterns to cover and reduce a syntax tree of operators. Templates ending in '\n' are complete instructions. The rest are instruction "parts", usually instruction operands, that are matched as part of larger rules.

    First, list a series of rules that reduce references to "VREG"s (variables in registers) to a register name. The nonterminal "reg" is used wherever we want to refer to a variable. These rules are common to all machine descriptions.

    A template beginning with '#' is treated here as merely a comment; rather than causing an instruction to be generated directly, this signals the code generator to call emit2() (defined below) to do it. We'll later see where this is useful. (emit2 need do nothing for these particular rules.)

    %%
    
    reg:    INDIRI1(VREGP)  "# read reg\n"
    reg:    INDIRU1(VREGP)  "# read reg\n"
    reg:    INDIRI2(VREGP)  "# read reg\n"
    reg:    INDIRU2(VREGP)  "# read reg\n"
    reg:    INDIRP2(VREGP)  "# read reg\n"
    freg:   INDIRF4(VREGP)  "# read reg\n"
    freg:   INDIRF8(VREGP)  "# read reg\n"
    

    The following rules reduce assignment to a register variable. This works because, by referring to the nonterminal "reg", we know that earlier rules have produced instructions that reference the register.

    stmt:   ASGNI1(VREGP,reg)   "# write reg\n"
    stmt:   ASGNU1(VREGP,reg)   "# write reg\n"
    stmt:   ASGNI2(VREGP,reg)   "# write reg\n"
    stmt:   ASGNU2(VREGP,reg)   "# write reg\n"
    stmt:   ASGNP2(VREGP,reg)   "# write reg\n"
    stmt:   ASGNF4(VREGP,freg)  "# write reg\n"
    stmt:   ASGNF8(VREGP,freg)  "# write reg\n"
    

    The LOAD rules implement register-to-register moves. Unlike other operators, these don't correspond to C semantic constructs but are compiler generated. (I'm not yet sure if/when LOADB is generated.)

    reg:    LOADB(reg)      "# foo\n"
    freg:   LOADF4(freg)    "# foo\n"               move(a)
    freg:   LOADF8(freg)    "# foo\n"               move(a)
    reg:    LOADI1(reg)     "movb %0,%c ; LOADI1\n" move(a)
    reg:    LOADU1(reg)     "movb %0,%c ; LOADU1\n" move(a)
    reg:    LOADI2(reg)     "mov %0,%c ; LOADI2\n"  move(a)
    reg:    LOADU2(reg)     "mov %0,%c ; LOADU2\n"  move(a)
    reg:    LOADP2(reg)     "mov %0,%c ; LOADP2\n"  move(a)
    

    The following rule "discards" the rare statements that reduce to a register reference. (See the lcc book for an example.)

    stmt:   reg     ""
    

    The rules for constants are the most trivial of all. These say that wherever the CNST operator appears, it should be substituted for a text fragment (its value). We'll refer to this fragment, nonterminal "con", in rules for assembling instructions.

    con:    CNSTI1  "%a"
    con:    CNSTU1  "%a"
    con:    CNSTI2  "%a"
    con:    CNSTU2  "%a"
    con:    CNSTP2  "%a"
    con:    addrg   "%a"
    

    The following are "special case" constants used for many optimisations. The cost function range(a,lo,hi) specifies the values for which the rule should match. (For a detailed explanation, see lcc book.)

    con0:   CNSTI1  "%a"    range(a,0,0)
    con0:   CNSTU1  "%a"    range(a,0,0)
    con0:   CNSTI2  "%a"    range(a,0,0)
    con0:   CNSTU2  "%a"    range(a,0,0)
    con0:   CNSTP2  "%a"    range(a,0,0)
    
    con1:   CNSTI2  "%a"    range(a,1,1)
    con1:   CNSTU2  "%a"    range(a,1,1)
    con2:   CNSTI2  "%a"    range(a,2,2)
    con2:   CNSTU2  "%a"    range(a,2,2)
    con8:   CNSTI2  "%a"    range(a,8,8)
    con8:   CNSTU2  "%a"    range(a,8,8)
    
    conm1:  CNSTI2  "%a"    range(a,-1,-1)
    

    The constants above were self-explanatory. The next ones describe some constrained ranges. Firstly shift amounts are restricted to -32..31 by the ASH instruction, so our templates for shifts ensure that ASH is not generated for other amounts (shifts outside this range don't make sense anyway, but this caution can prevent assembly errors). We don't directly handle negative shifts (which are undefined by the C standard anyway). The following rules define valid ranges for character constants.

    consh:  CNSTI2  "%a"    range(a,0,31)
    consh:  CNSTU2  "%a"    range(a,0,31)
    conc:   CNSTI2  "%a"    range(a,-128,127)
    

    The next rules define addressing modes and the fragments of operand text that specify them in instruction formats.

    Addresses of values in memory enter lcc's syntax tree as one of these three operators - ADDRG, ADDRL and ADDRF - address of Global, Local, or Formal parameter respectively. On the PDP-11, addresses of globals are always symbols, and locals and formals are always on the stack frame (and accessed using Register Indexed mode). (I have somewhat arbitrarily assigned costs for these modes based on whether they use an extra instruction word.)

    addr:   ADDRGP2         "%a"        1
    addr:   ADDRLP2         "%a(fp)"    1
    addr:   ADDRFP2         "%a(fp)"    1
    

    These parallel rules, which for stack variables give the "raw" offset rather than a formatted addressing mode, come in handy later.

    addrg:  ADDRGP2         "%a"
    addrf:  ADDRLP2         "%a"
    addrf:  ADDRFP2         "%a"
    

    Now come rules for two further addressing modes. Where an address is found in a register, the operand is accessed using Register Deferred mode. Where an operand is at a constant offset from an address in a register, we use Register Index mode.

    addr:   reg             "(%0)"
    addr:   ADDU2(reg,con)  "%1 (%0)"   1
    addr:   ADDI2(reg,con)  "%1 (%0)"   1
    addr:   ADDP2(reg,con)  "%1 (%0)"   1
    

    Now, we can bundle the above into two flavours of addresses.

    The first kind is simple addressing of a variable using the unchanged templates above. The second kind matches a new operator, INDIR, to show that the operand is actually a pointer to the variable's address. The PDP supports this indirection by "Deferred" variants of the modes already described, encoded by '@' in the instruction format.

    For example, to load a global variable G into a register, we might use MOV G,R0. Where G contains the address-of a variable (i.e. G is a pointer), and we wish to load the variable pointed-to (this is what the INDIR operator means), we would MOV @G,R0.

    mem:    addr            "%0"
    mem:    INDIRP2(addr)   "@ %0"
    

    The PDP lacks a "load effective address" instruction, so we manually construct equivalent calculations for the addressing modes above. This is used, for instance, when we take the address of a variable.

    To take the address of a global we take the 'literal' value of the symbol (format '#' means immediate constant). To take the address of a local variable or formal parameter, we load the frame pointer and add the appropriate offset.

    reg:    ADDRGP2 "mov # %a,%c ; reg <- ADDRGP2\n"    6
    reg:    ADDRLP2 "mov fp,%c\nadd # %a,%c\n"          6
    reg:    ADDRFP2 "mov fp,%c\nadd # %a,%c\n"          6
    

    We can always reduce a constant zero to a register, if needed, by using a CLR instruction. This is a typical use of the special-case constant rules given above.

    reg:    con0    "clr %c\n"  2
    

    Now a new, versatile non-terminal, "opd" (short for operand), is introduced, which represents yet another architectural concept: an instruction operand - in PDP parlance, a source or destination field of an instruction.

    Many addressing modes are possible for such an operand:

  • immediate constant operands are formed by taking a "con" nonterminal and prefixing '#'
  • any register is a valid operand (its name)
  • any "addr", as a variable reference (possibly Deferred).
    opd:    con             " #%0"  1
    opd:    reg             "%0"    
    opd:    INDIRI1(mem)    "%0"    
    opd:    INDIRU1(mem)    "%0"    
    opd:    INDIRI2(mem)    "%0"    
    opd:    INDIRU2(mem)    "%0"    
    opd:    INDIRP2(mem)    "%0"    
    

    Any "opd" can reduce to a register, by loading it into that register. This is the basic rule for MOV. (In templates, '%c' signifies the target register for a node.)

    reg:    opd     "mov %0,%c ; reg <- opd\n" 5
    

    Now it's time for an operator that actually does something: the ASGN operator. Such rules, rather than reducing to a "reg" or "opd" like the rules above, reduce to the special nonterminal "stmt" ("statement executed for side effect").

    To assign to a global variable G, for instance, the tree will look like ASGNx(ADDRGP2(G),...). The left side of the operator is always an address, so the nonterminal "addr" is exactly what we need here.

    The value assigned (right side) can take various forms: Either a general "opd" - constant, register or memory reference - or a special case such as constant zero. Note that without the latter rule, a register would be cleared using the "reg: con0" rule above, and then that register would be assigned via the rules below. By giving the rule for "con0" below, we provide a better reduction by matching the zero-assignment with a single CLR or CLRB (depending on whether the destination is a word or byte in size).

    First a bunch of byte assignments, then word equivalents:

    stmt:   ASGNI1(mem,opd)     "movb %1,%0 ; ASGNI1\n"
    stmt:   ASGNU1(mem,opd)     "movb %1,%0 ; ASGNU1\n"
    stmt:   ASGNI1(mem,con0)    "clrb %0 ; ASGNI1\n"
    stmt:   ASGNU1(mem,con0)    "clrb %0 ; ASGNU1\n"
    
    stmt:   ASGNI2(mem,opd)     "mov %1,%0 ; ASGNI2\n"
    stmt:   ASGNU2(mem,opd)     "mov %1,%0 ; ASGNU2\n"
    stmt:   ASGNP2(mem,opd)     "mov %1,%0 ; ASGNP2\n"
    stmt:   ASGNI2(mem,con0)    "clr %0 ; ASGNI2\n"
    stmt:   ASGNU2(mem,con0)    "clr %0 ; ASGNU2\n"
    stmt:   ASGNP2(mem,con0)    "clr %0 ; ASGNP2\n"
    

    With ASGN we have an opportunity to give rules for some useful optimisations. The following rules are for "in-place" modifications, such as might occur in a statement such as A += 2.

    Conceptually this is a match for a rule ASGN(A,OP(A,X)) where the two A's are equal addresses, OP is one of several applicable operations, and X is the second operand. The cost function memop(a) ensures that these rules only apply when the assignment destination and the operand address are the same (see lcc book for more details).

    We give rules for each PDP-11 operation that can be performed in-place on an operand (register or memory). You will note several uses of the special-case constants, such as "con1", for instance to match additions-by-1 with a template that emits a simple INC. (Ditto for single-bit shifts.)

    (Another tempting reduction/optimisation would be to match in-place operations on bytes such as INCB and DECB. This would be done with rules that resemble "ASGNI1(addr,ADDI2(CVII2(opd),con1))", but unfortunately lcc automatically inserts a LOAD operator above the right hand of this assignment, making it impossible to write rules that match. :( )

    stmt:   ASGNI2(mem,NEGI2(opd))          "neg %0\n"      memop(a)
    
    stmt:   ASGNI2(mem,ADDI2(opd,opd))      "add %2,%0\n"   memop(a)
    stmt:   ASGNU2(mem,ADDU2(opd,opd))      "add %2,%0\n"   memop(a)
    stmt:   ASGNP2(mem,ADDP2(opd,opd))      "add %2,%0\n"   memop(a)
    stmt:   ASGNI2(mem,ADDI2(opd,con1))     "inc %0\n"      memop(a)
    stmt:   ASGNU2(mem,ADDU2(opd,con1))     "inc %0\n"      memop(a)
    stmt:   ASGNP2(mem,ADDP2(opd,con1))     "inc %0\n"      memop(a)
    stmt:   ASGNI2(mem,ADDI2(opd,conm1))    "dec %0\n"      memop(a)
    stmt:   ASGNU2(mem,ADDU2(opd,conm1))    "dec %0\n"      memop(a)
    stmt:   ASGNP2(mem,ADDP2(opd,conm1))    "dec %0\n"      memop(a)
    
    stmt:   ASGNI2(mem,SUBI2(opd,opd))      "sub %2,%0\n"   memop(a)
    stmt:   ASGNU2(mem,SUBU2(opd,opd))      "sub %2,%0\n"   memop(a)
    stmt:   ASGNP2(mem,SUBP2(opd,opd))      "sub %2,%0\n"   memop(a)
    stmt:   ASGNI2(mem,SUBI2(opd,con1))     "dec %0\n"      memop(a)
    stmt:   ASGNU2(mem,SUBU2(opd,con1))     "dec %0\n"      memop(a)
    stmt:   ASGNP2(mem,SUBP2(opd,con1))     "dec %0\n"      memop(a)
    
    stmt:   ASGNI2(mem,LSHI2(opd,con1))     "asl %0\n"      memop(a)
    stmt:   ASGNU2(mem,LSHU2(opd,con1))     "asl %0\n"      memop(a)
    stmt:   ASGNI2(mem,LSHI2(opd,con8))     "swab %0\nclrb %0\n"        memop(a)
    stmt:   ASGNU2(mem,LSHU2(opd,con8))     "swab %0\nclrb %0\n"        memop(a)
    
    stmt:   ASGNI2(mem,RSHI2(opd,con1))     "asr %0\n"      memop(a)
    stmt:   ASGNU2(mem,RSHU2(opd,con1))     "clc\nror %0\n" memop(a)
    stmt:   ASGNU2(mem,RSHU2(opd,con8))     "clrb %0\nswab %0\n"    memop(a)
    stmt:   ASGNI2(mem,DIVI2(opd,con2))     "asr %0\n"      memop(a)
    stmt:   ASGNU2(mem,DIVU2(opd,con2))     "clc\nror %0\n" memop(a)
    

    (Note rules above to reduce a divide-by-2 to shifts. We could also recognise larger power-of-2 divisors in emit2 and generate ASH instructions. lcc does not do this by itself, although it does generate shift operations for multiply-by-2.)

    Matching rules for bitwise AND operator (BAND) raises a quirk of the PDP-11 instruction set, which does not have an instruction to perform AND directly. Instead, it has the BIC (Bit Clear) instruction, which is equivalent to AND with the complement of the source operand. This has a couple of implications.

    First, an instruction sequence to compute the AND of two registers will require the source operand to be complemented first. Of course lcc is not expecting us to "clobber" that operand, so we have to complement it back afterwards. (We could allow a general "opd" here but since we are accessing the source operand three times I decided to make lcc force it into a "reg" first.)

    stmt:   ASGNI2(mem,BANDI2(opd,reg))     "com %2 ; BANDI2\nbic %2,%0 ; BANDI2\ncom %2 ; BANDI2\n"    memop(a)
    stmt:   ASGNU2(mem,BANDU2(opd,reg))     "com %2 ; BANDU2\nbic %2,%0 ; BANDU2\ncom %2 ; BANDU2\n"    memop(a)
    

    Second, we can make lemonade from the lemon (of not having AND), by exploiting the semantics of BIC as a complemented AND. An expression such as A &= ~B can then reduce to a single PDP-11 instruction:

    stmt:   ASGNI2(mem,BANDI2(opd,BCOMI2(opd))) "bic %2,%0\n" memop(a)
    stmt:   ASGNI2(mem,BANDU2(opd,BCOMU2(opd))) "bic %2,%0\n" memop(a)
    

    The rest of the bitwise operations are straightforward, except noting that on the PDP-11, XOR's source must be a register.

    stmt:   ASGNI2(mem,BORI2(opd,opd))      "bis %2,%0\n"   memop(a)
    stmt:   ASGNU2(mem,BORU2(opd,opd))      "bis %2,%0\n"   memop(a)
    
    stmt:   ASGNI2(mem,BXORI2(opd,reg))     "xor %2,%0\n"   memop(a)
    stmt:   ASGNU2(mem,BXORU2(opd,reg))     "xor %2,%0\n"   memop(a)
    
    stmt:   ASGNI2(mem,BCOMI2(opd))         "com %0\n"      memop(a)
    stmt:   ASGNU2(mem,BCOMU2(opd))         "com %0\n"      memop(a)
    

    That concludes the combined assign-and-operate rules. Now we give a simple rule for struct assignment, or block copy. This delegates emit2 to emit a series of MOV instructions, or a macro call which expands to a copying loop. (For macro definitions, see lcc.mlb.)

    stmt:   ASGNB(reg,INDIRB(reg))  "# use emit2\n"
    

    In the spirit of the ASGN rules above, we present another optimisation technique. This one notes that, when building actual parameters to a function on the stack, we can perform some operations on the top-of-stack itself, which can save a temporary location.

    Some of the most useful top-of-stack operations are zero extension and sign extension when char-sized parameters are promoted to int, as required by C. We prefer to do this directly at the top of stack, rather than requiring a scratch register (or worse).

    An actual parameter of zero can be pushed trivially with a CLR.

    stmt:   ARGI2(opd)          "mov %0,-(sp) ; ARGI2\n"
    stmt:   ARGU2(opd)          "mov %0,-(sp) ; ARGU2\n"
    stmt:   ARGP2(opd)          "mov %0,-(sp) ; ARGP2\n"
    stmt:   ARGI2(CVII2(opd))   "movb %0,-(sp) ; ARGI2(CVII2)\n"
    stmt:   ARGU2(CVIU2(opd))   "movb %0,-(sp) ; ARGU2(CVIU2)\n"
    stmt:   ARGI2(CVUI2(opd))   "clr -(sp)\nbisb %0,(sp) ; ARGI2(CVUI2)\n"
    stmt:   ARGU2(CVUU2(opd))   "clr -(sp)\nbisb %0,(sp) ; ARGU2(CVUU2)\n"
    stmt:   ARGI2(con0)         "clr -(sp) ; ARGI2\n"
    stmt:   ARGU2(con0)         "clr -(sp) ; ARGU2\n"
    stmt:   ARGP2(con0)         "clr -(sp) ; ARGP2\n"
    

    There is a laundry list of other useful things we can do to parameters after they're pushed on to the stack. These largely correspond to the assignment rules above. In several cases we can sign-extend (CVIx2) and push in one MOVB; even zero extension (CVUx2) takes only two instructions and needs no temporary.

    stmt:   ARGI2(ADDI2(opd,opd))           "mov %0,-(sp)\nadd %1,(sp) ; ARGI2(ADDI1)\n"
    stmt:   ARGU2(ADDU2(opd,opd))           "mov %0,-(sp)\nadd %1,(sp) ; ARGU2(ADDU2)\n"
    stmt:   ARGP2(ADDP2(opd,opd))           "mov %0,-(sp)\nadd %1,(sp) ; ARGP2(ADDP2)\n"
    stmt:   ARGI2(ADDI2(opd,con1))          "mov %0,-(sp)\ninc (sp) ; ARGI2(ADDI1)\n"
    stmt:   ARGI2(ADDI2(CVII2(opd),con1))   "movb %0,-(sp)\ninc (sp) ; ARGI2(ADDI2(CVII2))\n"
    stmt:   ARGU2(ADDU2(CVIU2(opd),con1))   "movb %0,-(sp)\ninc (sp) ; ARGU2(ADDU2(CVIU2))\n"
    stmt:   ARGI2(ADDI2(CVUI2(opd),con1))   "clr -(sp)\nbisb %0,(sp)\ninc (sp) ; ARGI2(ADDI2(CVUI2))\n"
    stmt:   ARGU2(ADDU2(CVUU2(opd),con1))   "clr -(sp)\nbisb %0,(sp)\ninc (sp) ; ARGU2(ADDU2(CVUU2))\n"
    stmt:   ARGU2(ADDU2(opd,con1))          "mov %0,-(sp)\ninc (sp) ; ARGU2(ADDU2)\n"
    stmt:   ARGU2(ADDU2(BCOMU2(opd),con1))  "mov %0,-(sp)\nneg (sp) ; ARGU2(ADDU2(BCOMU2(opd),1))\n"
    stmt:   ARGP2(ADDP2(opd,con1))          "mov %0,-(sp)\ninc (sp) ; ARGP2(ADDP2)\n"
    stmt:   ARGP2(ADDP2(opd,conm1))         "mov %0,-(sp)\ndec (sp) ; ARGP2(ADDP2)\n"
    stmt:   ARGI2(SUBI2(opd,opd))           "mov %0,-(sp)\nsub %1,(sp) ; ARGI2(SUBI1)\n"
    stmt:   ARGU2(SUBU2(opd,opd))           "mov %0,-(sp)\nsub %1,(sp) ; ARGU2(SUBU2)\n"
    stmt:   ARGP2(SUBP2(opd,opd))           "mov %0,-(sp)\nsub %1,(sp) ; ARGP2(SUBP2)\n"
    stmt:   ARGI2(SUBI2(opd,con1))          "mov %0,-(sp)\ndec (sp) ; ARGI2(SUBI1)\n"
    stmt:   ARGU2(SUBU2(opd,con1))          "mov %0,-(sp)\ndec (sp) ; ARGU2(SUBU2)\n"
    stmt:   ARGP2(SUBP2(opd,con1))          "mov %0,-(sp)\ndec (sp) ; ARGP2(SUBP2)\n"
    stmt:   ARGI2(SUBI2(CVII2(opd),con1))   "movb %0,-(sp)\ndec (sp) ; ARGI2(SUBI2(CVII2))\n"
    stmt:   ARGU2(SUBU2(CVIU2(opd),con1))   "movb %0,-(sp)\ndec (sp) ; ARGU2(SUBU2(CVIU2))\n"
    stmt:   ARGI2(SUBI2(CVUI2(opd),con1))   "clr -(sp)\nbisb %0,(sp)\ndec (sp) ; ARGI2(SUBI2(CVUI2))\n"
    stmt:   ARGU2(SUBU2(CVUU2(opd),con1))   "clr -(sp)\nbisb %0,(sp)\ndec (sp) ; ARGU2(SUBU2(CVUU2))\n"
    stmt:   ARGI2(BORI2(opd,opd))           "mov %0,-(sp)\nbis %1,(sp) ; ARGI2(BORI2)\n"
    stmt:   ARGI2(BANDI2(reg,con))          "# give to emit2\n"
    stmt:   ARGU2(BANDU2(reg,con))          "# give to emit2\n"
    stmt:   ARGU2(BORU2(opd,opd))           "mov %0,-(sp)\nbis %1,(sp) ; ARGU2(BORU2)\n"
    stmt:   ARGI2(BXORI2(opd,reg))          "mov %0,-(sp)\nxor %1,(sp) ; ARGI2(BXORI2)\n"
    stmt:   ARGU2(BXORU2(opd,reg))          "mov %0,-(sp)\nxor %1,(sp) ; ARGU2(BXORU2)\n"
    stmt:   ARGI2(LSHI2(opd,con1))          "mov %0,-(sp)\nasl (sp) ; ARGI2(LSHI2)\n"
    stmt:   ARGU2(LSHU2(opd,con1))          "mov %0,-(sp)\nasl (sp) ; ARGU2(LSHU2)\n"
    stmt:   ARGI2(MULI2(opd,con2))          "mov %0,-(sp)\nasl (sp) ; ARGI2(MULI2)\n"
    stmt:   ARGU2(MULU2(opd,con2))          "mov %0,-(sp)\nasl (sp) ; ARGU2(MULU2)\n"
    stmt:   ARGI2(RSHI2(opd,con1))          "mov %0,-(sp)\nasr (sp) ; ARGI2(RSHI2)\n"
    stmt:   ARGU2(RSHU2(opd,con1))          "mov %0,-(sp)\nclc\nror (sp) ; ARGU2(RSHU2)\n"
    stmt:   ARGI2(DIVI2(opd,con2))          "mov %0,-(sp)\nasr (sp) ; ARGI2(DIVI2)\n"
    stmt:   ARGU2(DIVU2(opd,con2))          "mov %0,-(sp)\nclc\nror (sp) ; ARGU2(DIVU2)\n"
    stmt:   ARGI2(NEGI2(opd))               "mov %0,-(sp)\nneg (sp) ; ARGI2(NEGI2)\n"
    stmt:   ARGI2(BCOMI2(opd))              "mov %0,-(sp)\ncom (sp) ; ARGI2(BCOMI2)\n"
    stmt:   ARGU2(BCOMU2(opd))              "mov %0,-(sp)\ncom (sp) ; ARGU2(BCOMU2)\n"
    

    The following two rules optimise the case where an actual parameter is a computed address. Again, it's most efficient to do the offset computation (add) on the top-of-stack. (We can't use the "mem" nonterminal because the template is wrong: it is built as an addressing mode indexing the frame pointer. In this case we need the bare offset, which is provided by nonterminal "memf".)

    stmt:   ARGP2(addrf)    "mov fp,-(sp)\nadd #%0,(sp) ; ARGP2(frame)\n"
    

    There are two kinds of rules for CALL operators. The first kind is more general, and can deal with, for instance, Deferred addressing: where the called function address is in a variable. The arguments have already been pushed on the stack, and after the call we must adjust the stack pointer to remove them.

    reg:    CALLI2(mem) "jsr pc,%0\n"                               (a->syms[0]->u.c.v.u == 0 ? 1 : LBURG_MAX)
    reg:    CALLI2(mem) "jsr pc,%0\ntst (sp)+ ; pop args\n"         (a->syms[0]->u.c.v.u == 2 ? 1 : LBURG_MAX)
    reg:    CALLI2(mem) "jsr pc,%0\ncmp (sp)+,(sp)+ ; pop args\n"   (a->syms[0]->u.c.v.u == 4 ? 1 : LBURG_MAX)
    reg:    CALLI2(mem) "jsr pc,%0\nadd #%a,sp ; pop args\n"        2
    reg:    CALLU2(mem) "jsr pc,%0\n"                               (a->syms[0]->u.c.v.u == 0 ? 1 : LBURG_MAX)
    reg:    CALLU2(mem) "jsr pc,%0\ntst (sp)+ ; pop args\n"         (a->syms[0]->u.c.v.u == 2 ? 1 : LBURG_MAX)
    reg:    CALLU2(mem) "jsr pc,%0\ncmp (sp)+,(sp)+ ; pop args\n"   (a->syms[0]->u.c.v.u == 4 ? 1 : LBURG_MAX)
    reg:    CALLU2(mem) "jsr pc,%0\nadd #%a,sp ; pop args\n"        2
    reg:    CALLP2(mem) "jsr pc,%0\n"                               (a->syms[0]->u.c.v.u == 0 ? 1 : LBURG_MAX)
    reg:    CALLP2(mem) "jsr pc,%0\ntst (sp)+ ; pop args\n"         (a->syms[0]->u.c.v.u == 2 ? 1 : LBURG_MAX)
    reg:    CALLP2(mem) "jsr pc,%0\ncmp (sp)+,(sp)+ ; pop args\n"   (a->syms[0]->u.c.v.u == 4 ? 1 : LBURG_MAX)
    reg:    CALLP2(mem) "jsr pc,%0\nadd #%a,sp ; pop args\n"        2
    
    stmt:   CALLV(mem)  "jsr pc,%0\n"                               (a->syms[0]->u.c.v.u == 0 ? 1 : LBURG_MAX)
    stmt:   CALLV(mem)  "jsr pc,%0\ntst (sp)+ ; pop args\n"         (a->syms[0]->u.c.v.u == 2 ? 1 : LBURG_MAX)
    stmt:   CALLV(mem)  "jsr pc,%0\ncmp (sp)+,(sp)+ ; pop args\n"   (a->syms[0]->u.c.v.u == 4 ? 1 : LBURG_MAX)
    stmt:   CALLV(mem)  "jsr pc,%0\nadd #%a,sp ; pop args\n"        2
    stmt:   CALLB(mem)  "jsr pc,%0\n"                               (a->syms[0]->u.c.v.u == 0 ? 1 : LBURG_MAX)
    stmt:   CALLB(mem)  "jsr pc,%0\ntst (sp)+ ; pop args\n"         (a->syms[0]->u.c.v.u == 2 ? 1 : LBURG_MAX)
    stmt:   CALLB(mem)  "jsr pc,%0\ncmp (sp)+,(sp)+ ; pop args\n"   (a->syms[0]->u.c.v.u == 4 ? 1 : LBURG_MAX)
    stmt:   CALLB(mem)  "jsr pc,%0\nadd #%a,sp ; pop args\n"        2
    

    RET (return) operators require nothing from us, nor from emit2. We do however use them for register targeting (see target), since the convention I've used passes function return values in R0.

    stmt:   RETF4(freg) "# RETF4\n"
    stmt:   RETF8(freg) "# RETF8\n"
    stmt:   RETI2(reg)  "# RETI2\n"
    stmt:   RETU2(reg)  "# RETU2\n"
    stmt:   RETP2(reg)  "# RETP2\n"
    stmt:   RETV        "# RETV\n"
    

    Now we deal with type conversions. We are only concerned here with the widening conversions, the four operators:

  • CVII2: widen signed char to signed int (sign extend)
  • CVUI2: widen unsigned char to signed int (zero extend)
  • CVIU2: widen signed char to unsigned int (sign extend)
  • CVUU2: widen unsigned char to unsigned int (zero extend)

    The PDP-11's instruction MOVB sign extends a byte to a word. It takes one more instruction to zero extend, which explains why signed chars are preferable on this target (right shifts are another example of a penalised unsigned operation).

    The conversions marked '#' are idempotent on this target (unless I am mistaken about the subtleties of conversion operators), and emit2 does nothing about them.

    reg:    CVII1(reg)  "mov %0,%c ; CVII1\n" move(a)
    reg:    CVIU1(reg)  "mov %0,%c ; CVIU1\n" move(a)
    reg:    CVUI1(reg)  "mov %0,%c ; CVUI1\n" move(a)
    reg:    CVUU1(reg)  "mov %0,%c ; CVUU1\n" move(a)
    reg:    CVII2(opd)  "movb %0,%c ; CVII2\n"  4
    reg:    CVIU2(opd)  "movb %0,%c ; CVIU2\n"  4
    reg:    CVPU2(reg)  "mov %0,%c ; CVPU2\n" move(a)
    reg:    CVUP2(reg)  "mov %0,%c ; CVUP2\n" move(a)
    reg:    CVUI2(opd)  "movb %0,%c ; CVUI2\nbic #^o-400,%c ; CVUI2\n"  8
    reg:    CVUU2(opd)  "movb %0,%c ; CVUU2\nbic #^o-400,%c ; CVUU2\n"  8
    

    After those dry conversions, it's time for dessert: the arithmetic operations. Again, you'll note special cases for add-by-one, shift-by-one. The costs attached to the end of these rules try to encourage the special cases to be used where the code generator has a choice.

    The pattern of operand reductions should also affect these cost calculations, so that where an operand can be entirely avoided, that expansion will be preferred. (That's the theory, anyway. In practice, choosing these costs seems a black art.)

    reg:    NEGI2(reg)  "?mov %0,%c\nneg %c\n"  2
    

    (The '?' suppresses the MOV where source (%0) and destination (%c) registers are the same. See lcc book.)

    For unsigned right-shifts where the shift is not a constant (i.e. it's in a register or in memory), we invoke a macro to do the shift with the ASH instruction. ASH always sign extends when it shifts right, so the macro includes code to mask away any sign extension (see lcc.mlb). We even need a macro for signed right shift, because the shift count operand must be negated before ASH can use it.

    It's worth explaining the rule for ADDU2(BCOMU2(reg),con1) since this seems an obscure expression. When "negating" an unsigned number, rather than invent an oxymoronic "NEGU" operation, lcc prefers to convert the negate operator into an unsigned complement and then add 1. Makes perfect sense, but the template can concisely specify the NEG instruction here (this special case also appeared earlier in one of the ARG rules).

    Where an unsigned value is shifted right by a constant amount, we let to emit2 to generate the shift followed by a "Bit Clear" with a computed mask to zero out the unwanted sign extension.

    (A related rule below optimises the obscure case of a right shift combined with a constant-bitwise-AND to use just one BIC. For instance, on an unsigned variable U, we can compute (U>>3)&7 with just one BIC. Without this optimisation we would be dismayed to see two BICs in a row, one generated for the shift, and one for the &.)

    reg:    ADDI2(reg,opd)      "?mov %0,%c\nadd %1,%c\n"
    reg:    ADDU2(reg,opd)      "?mov %0,%c\nadd %1,%c\n"
    reg:    ADDP2(reg,opd)      "?mov %0,%c\nadd %1,%c\n"
    reg:    ADDI2(reg,con1)     "?mov %0,%c\ninc %c\n"
    reg:    ADDU2(reg,con1)     "?mov %0,%c\ninc %c\n"
    reg:    ADDP2(reg,con1)     "?mov %0,%c\ninc %c\n"
    reg:    ADDI2(reg,conm1)    "?mov %0,%c\ndec %c\n"
    reg:    ADDU2(reg,conm1)    "?mov %0,%c\ndec %c\n"
    reg:    ADDP2(reg,conm1)    "?mov %0,%c\ndec %c\n"
    reg:    ADDU2(BCOMU2(reg),con1)     "?mov %0,%c\nneg %c\n"
    
    reg:    SUBI2(reg,opd)      "?mov %0,%c\nsub %1,%c\n"
    reg:    SUBU2(reg,opd)      "?mov %0,%c\nsub %1,%c\n"
    reg:    SUBP2(reg,opd)      "?mov %0,%c\nsub %1,%c\n"
    reg:    SUBI2(reg,con1)     "?mov %0,%c\ndec %c\n"
    reg:    SUBU2(reg,con1)     "?mov %0,%c\ndec %c\n"
    reg:    SUBP2(reg,con1)     "?mov %0,%c\ndec %c\n"
    
    reg:    LSHI2(reg,opd)      "?mov %0,%c\nash %1,%c\n"   6
    reg:    LSHU2(reg,opd)      "?mov %0,%c\nash %1,%c\n"   6
    reg:    LSHI2(reg,con1)     "?mov %0,%c\nasl %c\n"  3
    reg:    LSHU2(reg,con1)     "?mov %0,%c\nasl %c\n"  3
    reg:    LSHI2(reg,con8)     "?mov %0,%c\nswab %c\nclrb %c\n"    5
    reg:    LSHU2(reg,con8)     "?mov %0,%c\nswab %c\nclrb %c\n"    5
    
    reg:    RSHI2(reg,opd)      "?mov %0,%c\n$RSHI %1,%c\n" 6
    reg:    RSHU2(reg,opd)      "?mov %0,%c\n$RSHU %1,%c\n" 6
    reg:    RSHI2(reg,consh)    "?mov %0,%c\nash #-%1,%c\n" 6
    reg:    RSHU2(reg,consh)    "?mov %0,%c\nash #-%1,%c\n" 6
    reg:    RSHI2(reg,con1)     "?mov %0,%c\nasr %c\n"  3
    reg:    RSHU2(reg,con1)     "?mov %0,%c\nclc\nror %c\n" 3
    reg:    RSHU2(reg,con8)     "?mov %0,%c\nclrb %c\nswab %c\n"    5
    reg:    RSHI2(reg,consh)    "?mov %0,%c\nash #-%1,%c\n" 3
    reg:    RSHU2(reg,consh)    "# emit2 does shift and mask\n" 3
    

    For constant-bitwise-AND, we simply complement the constant first, and emit a BIC.

    reg:    BANDI2(reg,reg)         "?mov %0,%c ; BANDI2\ncom %1 ; BANDI2\nbic %1,%c ; BANDI2\ncom %1 ; BANDI2\n"   8
    reg:    BANDU2(reg,reg)         "?mov %0,%c ; BANDU2\ncom %1 ; BANDU2\nbic %1,%c ; BANDU2\ncom %1 ; BANDU2\n"   8
    reg:    BANDI2(reg,con)         "# emit2 helps out\n"
    reg:    BANDU2(reg,con)         "# emit2 helps out\n"
    reg:    BANDI2(reg,BCOMI2(opd)) "?mov %0,%c\nbic %1,%c\n"
    reg:    BANDU2(reg,BCOMU2(opd)) "?mov %0,%c\nbic %1,%c\n"
    reg:    BANDI2(CVUI2(reg),con)  "# job for emit2\n"
    reg:    BANDU2(CVUU2(reg),con)  "# job for emit2\n"
    reg:    BANDU2(RSHU2(reg,consh),con)    "# emit2 does shift and mask\n"
    
    reg:    BORI2(reg,opd)  "?mov %0,%c\nbis %1,%c\n"
    reg:    BORU2(reg,opd)  "?mov %0,%c\nbis %1,%c\n"
    
    reg:    BXORI2(reg,reg) "?mov %0,%c\nxor %1,%c\n"
    reg:    BXORU2(reg,reg) "?mov %0,%c\nxor %1,%c\n"
    
    reg:    BCOMI2(reg)     "?mov %0,%c\ncom %c\n"  2
    reg:    BCOMU2(reg)     "?mov %0,%c\ncom %c\n"  2
    
    DIV, MOD and MUL are only curious in that they are restricted in their operand registers. Specifically, the DIV instruction takes a 32-bit register even/odd pair (we choose R0/1) as its dividend. The divisor is the source operand (%1). The quotient and remainder are produced in R0 and R1 respectively (see target() below).

    Again, we optimise divide-by-2 into a shift, since lcc declines to do this by itself (probably because it's only super-cheap if the target does arithmetic shift, and not all do). Unsigned divide uses a runtime support subroutine that uses the signed DIV instruction, but checks some special cases first.

    reg:    DIVI2(reg,opd)  "tst r1 ; DIVI2\nsxt r0 ; DIVI2\ndiv %1,r0 ; DIVI2\n"   16
    reg:    DIVU2(reg,reg)  "jsr pc,$DIVU ; DIVU2\n"
    reg:    DIVI2(reg,con2) "?mov %0,%c\nasr %c\n"
    reg:    DIVU2(reg,con2) "?mov %0,%c\nclc\nror %c\n"
    
    reg:    MODI2(reg,opd)  "tst r1 ; MODI2\nsxt r0 ; MODI2\ndiv %1,r0 ; MODI2\n"
    reg:    MODU2(reg,reg)  "jsr pc,$DIVU ; MODU2\n"
    
    reg:    MULI2(reg,opd)  "mul %1,r1 ; MULI2\n"
    reg:    MULU2(reg,opd)  "mul %1,r1 ; MULU2\n"
    

    Now what you've all been waiting for, the conditional operators. First it should be noted that all of these rely upon conditional branches, which have a restricted range. This does put a limit on compiled function size, as does the rule below for JUMP.

    We could solve this by brute force, inverting the sense of a branch to hop over an unconditional far jump (e.g. BEQ A becomes BNE HOP, JMP A, HOP:), but this would be very gauche; taking Pollyanna's view, the branch limits seem a very neat way to enforce modularisation of big ugly functions, and I found it was a powerful motivator to me, writing this back-end, to find shorter instruction sequences to avoid branch-range fouls!

    Of course, there will be people who can't tolerate the assembler asking them to break up their multipage functions. Those people can insert the branch diddling solution I just mentioned. :)

    Note that we recognise and reduce some char-conversion operators, and emit CMPB as appropriate. This is where we finally get to use the "conc" nonterminal. (Where "conc" doesn't match, we must have a degenerate compare (e.g. char == 900) and lcc will take the long route, extending with a MOVB and using a word CMP.)

    Compares to zero can exploit the cheaper TST instruction. Even more so for byte compares to zero, which optimise away a type conversion.

    stmt:   EQI2(opd,opd)   "cmp %0,%1\nbeq %a\n"               8
    stmt:   EQU2(opd,opd)   "cmp %0,%1\nbeq %a\n"               8
    stmt:   EQI2(CVII2(opd),CVII2(opd)) "cmpb %0,%1\nbeq %a\n"  5
    stmt:   EQI2(CVUI2(opd),CVUI2(opd)) "cmpb %0,%1\nbeq %a\n"  5
    stmt:   EQI2(CVII2(opd),consc)      "cmpb %0,#%1\nbeq %a\n"     5
    stmt:   EQI2(opd,con0)  "tst %0\nbeq %a\n"                  3
    stmt:   EQU2(opd,con0)  "tst %0\nbeq %a\n"                  3
    stmt:   EQI2(CVII2(opd),con0)       "tstb %0\nbeq %a\n"         3
    stmt:   EQI2(CVUI2(opd),con0)       "tstb %0\nbeq %a\n"         3
    

    This is where it gets interesting. The PDP-11 has a combined "bitwise-AND then compare to zero" instruction, Bit Test (BIT), and its byte-sized equivalent, BITB. Let's make the most of them:

    stmt:   EQI2(BANDI2(opd,opd),con0)  "bit %1,%0\nbeq %a\n"
    stmt:   EQU2(BANDU2(opd,opd),con0)  "bit %1,%0\nbeq %a\n"
    stmt:   EQI2(BANDI2(CVII2(opd),opd),con0)   "bitb %1,%0\nbeq %a\n"
    stmt:   EQI2(BANDI2(CVUI2(opd),opd),con0)   "bitb %1,%0\nbeq %a\n"
    stmt:   EQU2(BANDU2(CVIU2(opd),opd),con0)   "bitb %1,%0\nbeq %a\n"
    stmt:   EQU2(BANDU2(CVUU2(opd),opd),con0)   "bitb %1,%0\nbeq %a\n"
    

    And all the same logic applies to NE (not equal) en bloc, except the BNE conditional branch is used.

    stmt:   NEI2(opd,opd)   "cmp %0,%1\nbne %a\n"               8
    stmt:   NEU2(opd,opd)   "cmp %0,%1\nbne %a\n"               8
    stmt:   NEI2(CVII2(opd),CVII2(opd)) "cmpb %0,%1\nbne %a\n"  5
    stmt:   NEI2(CVUI2(opd),CVUI2(opd)) "cmpb %0,%1\nbne %a\n"  5
    stmt:   NEI2(CVII2(opd),consc)      "cmpb %0,#%1\nbne %a\n"     5   
    stmt:   NEI2(opd,con0)  "tst %0\nbne %a\n"                  3
    stmt:   NEU2(opd,con0)  "tst %0\nbne %a\n"                  3
    stmt:   NEI2(CVII2(opd),con0)       "tstb %0\nbne %a\n"         3
    stmt:   NEI2(CVUI2(opd),con0)       "tstb %0\nbne %a\n"         3
    stmt:   NEI2(BANDI2(opd,opd),con0)  "bit %1,%0\nbne %a\n"
    stmt:   NEU2(BANDU2(opd,opd),con0)  "bit %1,%0\nbne %a\n"
    stmt:   NEI2(BANDI2(CVII2(opd),opd),con0)   "bitb %1,%0\nbne %a\n"
    stmt:   NEI2(BANDI2(CVUI2(opd),opd),con0)   "bitb %1,%0\nbne %a\n"
    stmt:   NEU2(BANDU2(CVIU2(opd),opd),con0)   "bitb %1,%0\nbne %a\n"
    stmt:   NEU2(BANDU2(CVUU2(opd),opd),con0)   "bitb %1,%0\nbne %a\n"
    

    The remaining relational operators are only interesting in that the PDP-11 has conditional branches useful for unsigned comparisons (BLO, BLOS, BHI, BHIS).

    stmt:   LTI2(opd,opd)   "cmp %0,%1\nblt %a\n"               8
    stmt:   LTU2(opd,opd)   "cmp %0,%1\nblo %a\n"               8
    stmt:   LTI2(opd,con0)  "tst %0\nbmi %a\n"
    stmt:   LTI2(CVII2(opd),CVII2(opd)) "cmpb %0,%1\nblt %a\n"
    stmt:   LTI2(CVUI2(opd),CVUI2(opd)) "cmpb %0,%1\nblo %a\n"
    stmt:   LTI2(CVII2(opd),consc)      "cmpb %0,#%1\nblt %a\n"     5
    stmt:   LTI2(CVII2(opd),con0)       "tstb %0\nbmi %a\n"
    
    stmt:   LEI2(opd,opd)   "cmp %0,%1\nble %a\n"               8
    stmt:   LEU2(opd,opd)   "cmp %0,%1\nblos %a\n"              8
    stmt:   LEI2(opd,con0)  "tst %0\nble %a\n"
    stmt:   LEU2(opd,con0)  "tst %0\nbeq %a\n"
    stmt:   LEI2(CVII2(opd),CVII2(opd)) "cmpb %0,%1\nble %a\n"
    stmt:   LEI2(CVUI2(opd),CVUI2(opd)) "cmpb %0,%1\nblos %a\n"
    stmt:   LEI2(CVII2(opd),consc)      "cmpb %0,#%1\nble %a\n"     5
    stmt:   LEI2(CVII2(opd),con0)       "tstb %0\nble %a\n"
    stmt:   LEI2(CVUI2(opd),con0)       "tstb %0\nbeq %a\n"
    
    stmt:   GTI2(opd,opd)   "cmp %0,%1\nbgt %a\n"               8
    stmt:   GTU2(opd,opd)   "cmp %0,%1\nbhi %a\n"               8
    stmt:   GTI2(opd,con0)  "tst %0\nbgt %a\n"
    stmt:   GTU2(opd,con0)  "tst %0\nbne %a\n"
    stmt:   GTI2(CVII2(opd),CVII2(opd)) "cmpb %0,%1\nbgt %a\n"
    stmt:   GTI2(CVUI2(opd),CVUI2(opd)) "cmpb %0,%1\nbhi %a\n"
    stmt:   GTI2(CVII2(opd),consc)      "cmpb %0,#%1\nbgt %a\n"     5
    stmt:   GTI2(CVII2(opd),con0)       "tstb %0\nbgt %a\n"
    stmt:   GTI2(CVUI2(opd),con0)       "tstb %0\nbne %a\n"
    
    stmt:   GEI2(opd,opd)   "cmp %0,%1\nbge %a\n"               8
    stmt:   GEU2(opd,opd)   "cmp %0,%1\nbhis %a\n"              8
    stmt:   GEI2(opd,con0)  "tst %0\nbpl %a\n"
    stmt:   GEI2(CVII2(opd),CVII2(opd)) "cmpb %0,%1\nbge %a\n"
    stmt:   GEI2(CVUI2(opd),CVUI2(opd)) "cmpb %0,%1\nbhis %a\n"
    stmt:   GEI2(CVII2(opd),consc)      "cmpb %0,#%1\nbge %a\n"     5
    stmt:   GEI2(CVII2(opd),con0)       "tstb %0\nbpl %a\n"
    

    The remaining two rules are trivial. Note that the "addr" nonterminal can theoretically produce addressing modes that are incompatible with BR, but I can't think of C constructs that might result in such a tree. For instance, it's only possible to goto a label.

    stmt:   JUMPV(addrg)    "br %0\n"
    stmt:   JUMPV(mem)      "jmp %0\n"  2
    
    stmt:   LABELV          "%a:\n"
    
    %%
    

    That's it for the rules. Now we cover the support functions. First some fairly standard cost functions.

    memop() detects the case ASGN(A,OP(A,...)) as required for the in-place operations.

    static int memop(Node p) {
        assert( p && generic(p->op) == ASGN );
        assert( p->kids[0] && p->kids[1] && p->kids[1]->kids[0] );
        if (generic(p->kids[1]->kids[0]->op) == INDIR
        && sametree(p->kids[0], p->kids[1]->kids[0]->kids[0]))
            return 1;
        else
            return LBURG_MAX;
    }
    
    static int sametree(Node p, Node q) {
        return (p == NULL && q == NULL)
            || (p && q && p->op == q->op && p->syms[0] == q->syms[0]
                && sametree(p->kids[0], q->kids[0])
                && sametree(p->kids[1], q->kids[1]));
    }
    

    cnstrhs() matches a constant right-hand of a rule.

    static int cnstrhs(Node p) {
        assert(p && p->kids[1]);
        return generic(p->kids[1]->op) == CNST ? 1 : LBURG_MAX;
    }
    

    progbeg() emits the assembly source header. I believe it's polite to name the assembler you are targeting right here. It can be very difficult on some platforms to deduce the flavour of assembly, otherwise, for someone confronted with your back-end output.

    Importantly, here is where we define the machine registers, r0-r7. We don't let lcc use all of them; we set tmask and vmask to bit masks of the allowed register sets for temporaries and variables respectively.

    static void progbeg(int argc, char *argv[]) {
        int i;
    
        parseflags(argc, argv);
    
        print("; PDP-11 assembly generated by lcc 4.2\n");
        print("; assemble with MACRO-11\n");
        print(".INCLUDE /LCC.MLB/\n");
        //print(".MCALL $CPYB,$CPYW,$LSH,$RSHI,$RSHU\n");
        print(".RADIX 10\n");
        print("fp=%%5 ; R5 is frame pointer\n");
        print(".ENABL LSB ; file-wide local symbols\n\n");
        //print(".LIST ME ; list macro expansions\n\n");
    
        for(i=0;i<=7;++i){
            ireg[i] = mkreg("r%d",i,1,IREG);
            freg[i] = mkreg("ac%d",i,1,FREG);
        }
    
        //quo = mkreg("r0",0,1,IREG); quo->x.regnode->mask |= 2;
        //rem = mkreg("r1",1,1,IREG); rem->x.regnode->mask |= 1;
    
        for(i=0;i<=3;++i){
            lreg[i] = mkreg("%%%d",i*2,3,IREG);
        }
        tmask[FREG] = FLTTMP; vmask[FREG] = FLTVAR;
        tmask[IREG] = INTTMP; vmask[IREG] = INTVAR;
    
        iregw = mkwildcard(ireg);
        lregw = mkwildcard(lreg);
        fregw = mkwildcard(freg);
    }
    

    Return the corresponding register wildcard for a given operator type (Block, Pointer, Int, Unsigned int, Floating).

    static Symbol rmap(int opk) {
        switch (optype(opk)) {
        case B: case P:
            return iregw;
        case I: case U:
            return opsize(opk)==4 ? lregw : iregw;
        case F: 
            return fregw;
        default:
            return 0;
        }
    }
    
    static void segment(int n) {
    /* put everything in default PSECT for now */
    }
    
    static void progend(void) {
        print(".END\n");
    }
    
    #define INTCONST(P,V) ( generic(P->op)==CNST && \
         ( (optype(P->op)==I && P->syms[0]->u.c.v.i==V) \
        || (optype(P->op)==U && P->syms[0]->u.c.v.u==V) ) )
    

    The target() function allows us to specify particular registers we want operands in, for operators that need this.

    On the PDP-11, we expect DIV and MUL to operate on specific registers, and those nodes also produce a result in a specific register. Set these with rtarget and setreg (see lcc book).

    Our calling convention puts function returns in R0, so we specify that CALL nodes reduce to that register, and RET nodes must develop their value therein.

    static void target(Node p) {
        int op = specific(p->op);
        assert(p);
        switch (op) {
        case MUL+I: case MUL+U:
            /*  [ Re,o means the register pair Rn (even) and Rn+1 (odd) ]
                both multiplier (src) and multiplicand (R) are 16 bits
                    32-bit product: MUL src,Re    Re,o <- Re * src
                    16-bit product: MUL src,Ro  Ro <- Ro * src
                We always use R1 for 16-bit multiplicand and product */
            if(!INTCONST(p->kids[1],2)){ /* x2 is converted to left shift  */
                rtarget(p, 0, ireg[1]); /* kids[0] must deliver multiplicand in R1 */
                setreg(p, ireg[1]); /* product in R1 */
            }
            break;
        case DIV+I: case DIV+U:
            /*  32-bit divide:      DIV src,Re  [dst always even register]
                divisor (src) is 16 bits; dividend (Re,o) is 32 bits
                Note hi word of dividend is R0, lo word R1
                    Re <- 16 bit quotient of Re,o / src
                    Ro <- 16 bit remainder
                We always use R0 for quotient and dividend */
            if(!INTCONST(p->kids[1],2)){ /* /2 is converted to right shift  */
                if(op == DIV+U) /* special case for unsigned divide, put divisor in R0 */
                    rtarget(p,1,ireg[0]);
                rtarget(p, 0, ireg[1]); /* kids[0] must deliver dividend into R1 */
                setreg(p, ireg[0]);     /* produces quotient in R0 */
            }
            break;
        case MOD+I: case MOD+U:
            if(op == MOD+U) /* special case for unsigned divide, put divisor in R0 */
                rtarget(p,1,ireg[0]);
            rtarget(p, 0, ireg[1]); /* kids[0] must deliver dividend into R1 */
            setreg(p, ireg[1]);     /* produces remainder in R1 */
            break;
        case CALL+I: case CALL+U: case CALL+P: case CALL+V:
            setreg(p, opsize(p->op) == 4 ? lreg[0] : ireg[0]);
            break;
        case CALL+F:
            setreg(p, freg[0]);
            break;
        case RET+I: case RET+U: case RET+P:
            rtarget(p, 0, ireg[0]);
            break;
        case RET+F:
            rtarget(p, 0, freg[0]);
            break;
        }
    }
    

    There may be opportunities to use clobber on this target (for instance DIV and MUL, or where support macros trash registers) but I don't understand it fully yet; my attempts at using it produce lcc errors. As a workaround, the support macros conservatively preserve registers, and I seem to have got away with not telling lcc that DIV trashes anything. The use of the special "divd" register, which covers R0 and R1, may be enough for lcc to realise that both registers are trashed. I won't pretend to know more about clobber than I do. :)

    static void clobber(Node p) {
    }
    
    

    Helper routines for emit2().

    static void emitband(char *dstname,unsigned andmask){
        print("bic #^o%o,%s ; BANDx 0%o\n", ~andmask & 0xffff, dstname, andmask);
    }
    
    static void emitrshu(Node p,char *dstname,unsigned andmask){
        Symbol src;
        unsigned shift,mask;
        
        assert(p->kids[1]);
        assert(p->kids[1]->syms[0]);
        assert(generic(p->kids[1]->op) == CNST);
        
        src = ireg[getregnum(p->kids[0])];
        shift = p->kids[1]->syms[0]->u.c.v.u;
    //dumptree(p);fputc('\n',stderr);
    //fprintf(stderr,"emit2: emitrshu: emitted=%d src=%s dst=%s shift=%d &mask=0%o\n", 
    //  p->x.emitted, src->x.name, dstname,shift,andmask);
    
        if(shift > 15 || !andmask) /* fussy optimisation; this would not normally occur */
            print("clr %s ; RSHU\n",dstname);
        else{
            if (dstname != src->x.name)
                print("mov %s,%s ; RSHU\n", src->x.name, dstname);
            if(shift > 0){
                if(shift == 1)
                    print("clc ; RSHU\nror %s ; RSHU\n", dstname);
                else
                    print("ash #%d,%s ; RSHU\n", -shift, dstname);
                //mask = ~(andmask & (0xffff >> shift)) & 0xffff;
                emitband(dstname,andmask & (0xffff >> shift));
                //print("bic #^o%o,%s ; RSHU %d, &mask 0%o\n", mask, dstname, shift, andmask);
            }
        }
    }
    

    The big ball of wax: emit2(). Let's discuss each operator singly.

    static void emit2(Node p) {
        int op = generic(p->op);
        unsigned shift,mask,size,align,i;
        Symbol src,dst;
        char *m,*d,*s;
        
        if(optype(p->op) == F){
    		/* Floating point operations not impl yet */
        }else
            switch( op ){
            //default: dumptree(p); fputc('\n',stderr); break; /* debugging only*/
    

    Emit ASGNB: block copies.
    There are three basic ways we can do block copies:

  • inline (unrolled) moves
  • macro which expands to a loop
  • subroutine call

    For copies not larger than 16 bytes, we decide to unroll into moves. (This threshold is chosen because it's about the same length as the slower macro loop, so for shorter copies, we save neither code space or time by using the macro.)

    For larger copies, we could either expand a loop inline (macro), or call a subroutine... the macro is 9-10 instructions but, being inline, allows flexibility of source and destination registers. The subroutine call would be fewer inline instructions, but would constrain register targeting.

    Since we're register starved, prefer macro for now: $CPYB src,dst,n for unaligned copies, $CPYW src,dst,n for aligned copies (can be odd number of bytes).

            case ASGN:
                //dumptree(p); fputc('\n',stderr);
                if(specific(p->op) == ASGN+B){ /* copy struct, or data block */
                    d = ireg[getregnum(p->x.kids[0])]->x.name;
                    s = ireg[getregnum(p->x.kids[1])]->x.name;
                    size = p->syms[0]->u.c.v.u;
                    align = p->syms[1]->u.c.v.u;
                    if(align==1 || size<2){
                        /* non aligned, or single byte copy: must do byte moves */
                        if(size > 8) 
                            print("$CPYB %s,%s,%d ; ASGNB\n",s,d,size);
                        else{
                            /* first byte is just register-indirect, not indexed */
                            print("movb  (%s), (%s) ; ASGNB\n",s,d);
                            for(i=1;i<=size-1;++i) 
                                print("movb %d(%s),%d(%s) ; ASGNB\n",i,s,i,d);
                        }
                    }else{
                        /* word aligned copy */
                        if(size > 16) 
                            print("$CPYW %s,%s,%d ; ASGNB\n",s,d,size);
                        else{
                            /* first word is just register-indirect, not indexed */
                            print("mov  (%s), (%s) ; ASGNB\n",s,d);
                            for(i=2;i<=size-2;i+=2) 
                                print("mov %d(%s),%d(%s) ; ASGNB\n",i,s,i,d);
                            if(size&1) /* if odd number, copy last byte */
                                print("movb %d(%s),%d(%s) ; ASGNB\n",i,s,i,d);
                        }
                    }
                }
                break;
    

    Emit ARG(BAND(X,CNST)). This is a hard-won optimisation like the examples earlier: we want the opportunity to BIC directly on top of stack rather than on a temporary location:

            case ARG: 
                if(optype(p->op) == F)
                    fpld(p, freg[getregnum(p->x.kids[0])]->x.name, "-(sp)");
                else{
                    /* stmt: ARGx2(BANDx2(reg,con)) */
                    //dumptree(p); fputc('\n',stderr);
                    assert( generic(p->kids[0]->op) == BAND 
                            && generic(p->kids[0]->kids[1]->op) == CNST );
                    src = ireg[getregnum(p->kids[0]->kids[0])];
                    print("mov %s,-(sp) ; ARGx(BANDx)\n",src->x.name);
                    emitband("(sp)", p->kids[0]->kids[1]->syms[0]->u.c.v.u);
                }
                break;
    

    Emit BAND(X,CNST). The idea here is simple enough. We complement the constant operand ourselves and emit a BIC instruction. We notice BAND(CVU(X),CNST) operations as well, because an unsigned zero-extend involves a bit mask. By combining both masks, we save one BIC.

    Ditto in the case of BAND(RSHU(X,CNST),CNST). The unsigned right shift involves masking off sign-extended bits, so we notice this, and combine the masks. (These unsigned operations occur in bit-field operations, so these type of optimisations are rather important.) Simple idea but a fair chunk of code:

            case BAND:
                //dumptree(p);fputc('\n',stderr);
                /* also note possible optimisation of low byte extract, where source and dest differ:
                    MOV src,dst; BIC ^O-400,dst = CLR dst; BISB src,dst */
                assert( generic(p->kids[1]->op) == CNST );
                dst = ireg[getregnum(p)];
                mask = p->kids[1]->syms[0]->u.c.v.u;
                if(generic(p->kids[0]->op) == CVU ){
                    /* BANDx2(CVUx2(reg),con) */
                    /* combine AND mask with shift-clear mask */
                    src = ireg[getregnum(p->kids[0]->kids[0])];
                    //fprintf(stderr,"emit2: %s <- BANDx(CVUx(%s),0%o)\n",dst->x.name,src->x.name,mask);
                    /* always move (to do sign extension) */
                    print("movb %s,%s ; BANDx(CVUx)\n", src->x.name, dst->x.name);
                    emitband(dst->x.name,mask & 0377);
                }else{
                    /* if LHS is an RSHU, we combine the two */
                    if(specific(p->kids[0]->op) == RSH+U){
                        /*reg:  BANDU2(RSHU2(reg,consh),con) */
                        //fprintf(stderr,"emit2: %s <- BANDU2(RSHU2(%s,consh),0%o)\n",dst->x.name,src->x.name,mask);
                        emitrshu(p->kids[0], dst->x.name, mask);
                    }else{
                        /* reg: BANDx(reg,con) */
                        src = ireg[getregnum(p->x.kids[0])];
                        //fprintf(stderr,"emit2: %s <- BANDU2(%s,0%o)\n",dst->x.name,src->x.name,mask);
                        if (dst != src)
                            print("mov %s,%s ; BANDx\n", src->x.name, dst->x.name);
                        emitband(dst->x.name,mask);
                    }
                }
                break;
    

    (I know of one bug in the machine description, relating to BAND(X,CNST), which I have not flushed out yet, hence that diagnostic.)

    Emit CALL(ADDRG). This is the minor optimisation of popping stacked arguments after a CALL, mentioned earlier. If no arguments, we emit JSR alone; if one word, we can pop it with a compact TST (SP)+; otherwise ADD a constant to the stack pointer.

            case CALL:
                /* CALLxx(ADDRGP2) */
                //dumptree(p);fputc('\n',stderr);
                //fprintf(stderr,"## emit2: CALL: p->syms[0]->u.c.v.u = %d\n", p->syms[0]->u.c.v.u);
        
                assert(generic(p->kids[0]->op) == ADDRG);
                print("jsr pc,%s ; emit2: CALL\n",p->kids[0]->syms[0]->x.name);
                i = p->syms[0]->u.c.v.u;
                if(i == 2) print("tst (sp)+ ; pop arg\n");
                else if(i == 4) print("cmp (sp)+,(sp)+ ; pop args\n");
                else if(i>0) print("add #%d,sp ; pop args\n",i);
                break;
    

    Emit RSHU(X,CNST). Emit ASH (arithmetic shift) followed by a bit mask to zero the sign extended bits, as we mentioned earlier. Note the following possible further optimisations (but these would be simple templates):

  • left shift by 8 bits = SWAB dst; CLRB dst
  • right shift by 8 bits = CLRB dst; SWAB dst
            case RSH:
                /* note the following possible optimisations:
                    left shift by 8 bits = SWAB dst; CLRB dst
                    right shft by 8 bits = CLRB dst; SWAB dst
                   these would be implemented by templates, for instance:
                    reg:    LSHx2(reg,con8) "swab %c\nclrb %c\n"
                    reg:    RSHU2(reg,con8) "clrb %c\nswab %c\n"
                */
            
                //dumptree(p);
        
                /* reg: RSHx(reg,con) */
                /* for unsigned, emit arithmetic shift, then mask out extended sign  */
                assert(generic(p->kids[1]->op) == CNST);
                
                shift = p->kids[1]->syms[0]->u.c.v.u;
                dst = ireg[getregnum(p)];
                if(optype(p->op) == U)
                    emitrshu(p, dst->x.name, ~0);
                else
                    print("ash #%d,%s ; emit2: RSHI\n",-shift,dst->x.name);
        
                break;
        }
    }
    
    static void doarg(Node p) {
        assert(p && p->syms[0]);
        mkactual(2, p->syms[0]->u.c.v.i);
    }
    
    // Block operators not needed
    static void blkfetch(int k, int off, int reg, int tmp) {}
    static void blkstore(int k, int off, int reg, int tmp) {}
    static void blkloop(int dreg, int doff, int sreg, int soff,int size, int tmps[]) {}
    

    The local function gives lcc the chance to put a local variable in a register. If this doesn't succeed, then we assign a stack offset and set its textual value for use in instruction templates.

    static void local(Symbol p) {
        /* always put long locals on stack frame */
        if (  (isint(p->type) && p->type->size==4 && !p->temporary) || askregvar(p, rmap(ttob(p->type))) == 0)
        {
            assert(p->sclass == AUTO);
            offset = roundup(offset + p->type->size,2);
            p->x.offset = -offset;
            /* note, all offsets must have sign in front, so we can pull dirty tricks
               with assembler address arithmetic later. */
            p->x.name = stringf("-%d",offset);
            //fprintf(stderr,"local(\"%s\") offset=%d\n",p->name,offset);
        }
    }
    

    This is a biggie. Fairly self-explanatory, through wonderful ASCII art. We work out the stack frame offsets of all formal parameters here, then emit:

  • function prologue: set up frame pointer and save all used registers
  • function body, and
  • epilogue to restore registers and frame pointer value.

    We are not concerned with the return value; the function body will have emitted any necessary instructions.

    The only point of note here is that R0 is hardwired as unused, since it's used for function result anyway and obviously must not be preserved to caller.

    static void function(Symbol f, Symbol caller[], Symbol callee[], int n) {
        int i,nreg,nfreg,nargs;
        long fused;
        Symbol fs;
        
        usedmask[IREG] = usedmask[FREG] = 0;
        freemask[IREG] = INTTMP|INTVAR;
        freemask[FREG] = FLTTMP|FLTVAR;
    
        maxoffset = offset = maxargoffset = 0;
    
        /* Determine offsets of parameters relative
           to frame pointer (set up in prologue).
           This assumes right-to-left pushing of parameters,
           to facilitate variable-argument functions like printf.
           With right-to-left, we always know the address
           of the first parameter, because it is the last pushed,
           so has highest address on stack and a fixed offset (4)
           relative to frame pointer. */
    
        offset = 4;  /* allow for link register word @ 0(SP), and saved FP, after JSR */
        for (nargs = 0; callee[nargs]; nargs++) {
            Symbol p = callee[nargs];
            Symbol q = caller[nargs];
            p->x.offset = q->x.offset = offset;
            p->x.name = q->x.name = stringf("%d", p->x.offset);
            p->sclass = q->sclass = AUTO;
            //fprintf(stderr,"callee/caller[%d]: %s offset=%d\n", nargs,p->name,offset);
            offset += roundup(q->type->size,2);  // PDP-11 system stack is always word aligned
        }
        assert(caller[nargs] == 0);
        offset = maxoffset = 0;
        
        //fputs("...calling gencode()\n",stderr);
        gencode(caller, callee);
        
        framesize = roundup(maxoffset,2);
    
        // prologue...
        
        
        /*      layout of system stack frame:
    
                |               |   higher addresses
                +---------------+
                |   argN        |
                |   ...         |
                |   arg0        |   <- FP+4
                +---------------+
                |   link reg    |   <- FP+2 = SP after JSR
                +===============+
                |   saved FP    |   <- FP after prologue
                +---------------+
              / |   locals      |   <- FP-2
    framesize \ |   ...         |
                +---------------+
                |   saved regs  |
                |   ...         |   <- SP after prologue
                +---------------+
                |               |   lower addresses
                
                (Without a dedicated frame pointer, stack offset bookkeeping 
                is unmanageable while parameters are being stacked for a CALL.)
                
                This layout is very similar to that used by RSX-11:
                http://www.ibiblio.org/pub/academic/computer-science/history/pdp-11/language/decus-C/5,7/cx.rno
                
                See here for a survey of subroutine call frame conventions:
                http://www.cs.clemson.edu/~mark/subroutines.html
                http://www.cs.clemson.edu/~mark/subroutines/pdp11.html (PDP-11 specific)
                and here http://cm.bell-labs.com/cm/cs/who/dmr/clcs.html (original PDP-11 C)
        */
        
        segment(CODE);
        print(".SBTTL %s\n",f->x.name);
        print("%s:\n",f->x.name);
        if(nargs||framesize){
            print("mov fp,-(sp) ; save frame pointer\n");
            print("mov sp,fp\n"); /* setup frame pointer */
            if(framesize == 2) 
                print("clr -(sp) ; alloc stack frame\n");
            else if(framesize > 2) 
                print("sub #%d,sp ; alloc stack frame\n",framesize);
        }
    
        /* don't bother saving scratch registers,
           which are assumed to include return register (R0) */
        usedmask[IREG] &= ~INTTMP; 
        fused = usedmask[FREG];
        usedmask[FREG] &= ~FLTTMP;   
    
        /* save used registers on stack, below stack frame */
        fprintf(stderr,"%16s used=",f->name);
        for( i=usedmask[IREG],nreg=0 ; i ; i>>=1,++nreg )
            if(i&1){
                fputc(' ',stderr);
                fputs(ireg[nreg]->x.name,stderr);
                print("mov %s,-(sp) ; save\n",ireg[nreg]->x.name);
            }else 
                fputs("   ",stderr);
    
        /* note we also need to save and restore FPP state,
           if any floating point is done in the function */
    
        // call front end to emit function body
        //fputs("...calling emitcode()\n",stderr);
    
        print(";\n");
        emitcode();
    
        // epilogue...
    
        print(";\n");
        
        /* restore used registers, in reverse order from save */
        for(i=usedmask[IREG] ; nreg-- ; )
            if(i & (1<x.name);
    
        if(nargs||framesize){
            if(framesize) print("mov fp,sp ; pop stack frame\n");
            print("mov (sp)+,fp ; restore frame pointer\n");
        }
        
    
        print("rts pc\n\n");
    }
    

    Simple hack here: MACRO-11 restricts identifiers to A-Z, 0-9, '.' and '$'. C obviously allows underscores, so we do a simple-minded transliteration. It should be noted that MACRO-11 (and moreso the linker) will complain about identifiers that are not distinct in the first six characters. This will likely mandate changes to code that is ported to this target (abbreviation is good for the soul...)

    static void defsymbol(Symbol p) {
        long v;
        char *q;
    
        if(p->scope >= LOCAL && p->sclass == STATIC)
            p->x.name = stringf("%d$", genlabel(1));
        else if (p->generated)
            p->x.name = stringf("%s$", p->name);
        else if(p->scope == CONSTANTS && isint(p->type)){
            /* make sure constants are literal decimal (not hex or octal) */
            p->x.name = isunsigned(p->type) ? stringf("%u",p->u.c.v.u) : stringf("%d",p->u.c.v.i);
        }else if(p->scope == GLOBAL || p->sclass == EXTERN){
            /* underscores not allowed in MACRO-11 identifiers; replace with $ */
            q = p->x.name = strdup(p->name);
            for( ; *q ; ++q )
                if(*q == '_')
                    *q = '$';
        }
    
        //fprintf(stderr,"defsymbol(%s = %s)\n",p->x.name,p->name);
    }
    

    I can think of nothing better than to quote the lcc bible here.

    initialize p to a symbol that represents an address of the form x+n, where x is the address represented by q and n is positive or negative. Like defsymbol, address initializes p->x, but it does so based on the values of q->x and n. A typical address adds q's stack offset to n for locals and parameters, and sets p->x.name to q->x.name concatenated with +n or -n for other variables. ... address accepts globals, parameters, and locals, and is called only after these symbols have been initialized by defsymbol, function, or local. Also, address is optional: If the function pointer in the interface record is null, the front end will not call address.
    Is that clear?
    static void address(Symbol q, Symbol p, long n) {
        q->x.offset = p->x.offset + n;
        q->x.name = stringf("%s%s%d",p->x.name,n<0?"":"+",n); 
        
        /*  if it's a frame variable, the basic name is +n(fp) or -n(fp)
            if it's a global, the basic name is just identifier: g
            we need to produce: +o+n(fp) or +o+g  respectively.
            Must have a leading +/- because we may want to textually prefix this with 
            yet another offset, in instruction templates. */
        //q->x.name = stringf("%s%d%s%s", n<0?"":"+",n,(p->x.name[0]=='+' || p->x.name[0]=='-')?"":"+",p->x.name); 
        //fprintf(stderr,"address(%s, %s, %d)\n",q->x.name,p->x.name,n);
    }
    

    Nothing too tricky about the remaining functions.

    static void defconst(int suffix, int size, Value v) {
        long val = suffix==I ? v.i : v.u;
        char s[32];
        switch(suffix){
        case F:
            sprintf(s,"%.18e",v.d);
            print(".FLT%d %s\n",size,s);
            break;
        case I: 
        case U: 
            if(size==1) 
                print(".BYTE %d\n.EVEN\n",v.i);
            else if(size==2) 
                print(".WORD %d\n",v.i);
            else if(size==4) 
                print(".WORD %u ; lo\n.WORD %u ; hi\n",v.u & 0xffff,v.u>>16);
            break;
        case P: 
            print(".WORD ^o%o ; ptr\n",v.p); 
            break;
        }
    }
    
    static void defaddress(Symbol p) {
        print(".WORD %s\n",p->x.name);
    }
    

    Don't be tempted to change this to a "string" directive (such as .ASCII): it can cause serious problems. For instance, some PDP-11 console environments fold case from the terminal. If we emit a string, the contents of the string get folded. If we emit numeric constants (as below), any such insidious data modification is prevented. This could be a crucial point for strings where changing case is not merely cosmetic (very easy to imagine examples).

    Only put 8 bytes per line because that fits MACRO-11 listings nicely.

    static void defstring(int n, char *str) {
        int i;
        for(i=0;ix.name);
    }
    
    static void import(Symbol p) {
    }
    

    Simply using a double colon (e.g. MAIN::) makes an identifier global, but then we lose the ability to distinguish identifiers that are not supposed to be global (e.g. statics). So we assume all identifiers are local to the module, and allow lcc to call global above for those that are not static.

    static void global(Symbol p) {
        print("%s:\n", p->x.name);
    }
    
    static void space(int n) {
        print(".REPT %d\n.BYTE 0\n.ENDR\n",n);
        print(".EVEN\n");
    }
    

    The mother ship: The Interface Record. This defines the sizes and alignments of our types, and other machine specific parameters. In particular we specify that function arguments are pushed right-to-left, which facilitates variable-argument calling (such as printf).

    See interface4.pdf for details of the interface record.

    Interface pdp11IR = {
        1, 1, 0,  /* char */
        2, 2, 0,  /* short */
        2, 2, 0,  /* int */
        4, 2, 1,  /* long */
        4, 2, 1,  /* long long */
        4, 2, 1,  /* float */
        8, 2, 1,  /* double */
        8, 2, 1,  /* long double */
        2, 2, 0,  /* T * */
        0, 2, 1,  /* struct */
    
        1,        /* little_endian */
        1,        /* mulops_calls */
        0,        /* wants_callb */
        0,        /* wants_argb */
        0,        /* left_to_right */
        0,        /* wants_dag */
        0,        /* unsigned_char */
    
        address,
        blockbeg,
        blockend,
        defaddress,
        defconst,
        defstring,
        defsymbol,
        emit,
        export,
        function,
        gen,
        global,
        import,
        local,
        progbeg,
        progend,
        segment,
        space,
        0, 0, 0, 0, 0, 0, 0,
        {   1, /* max_unaligned_load */
            rmap,
            blkfetch, blkstore, blkloop,
            _label,
            _rule,
            _nts,
            _kids,
            _string,
            _templates,
            _isinstruction,
            _ntname,
            emit2,
            doarg,
            target,
            clobber,
        }
    };
    

    That's all folks, until I get floating point and long arithmetic sorted out. I have a couple more targets in the wings too.

    How to integrate with lcc

    (These instructions are not needed if you simply check out the complete Subversion source tree.)

    Download and extract lcc 4.2. All instructions assume current directory is the unpacked lcc-4.2 directory.

    Place pdp11.md in src subdirectory.

    Some changes must be made to the makefile:

    Optionally, you can change the compiler flags to add optimisation, and suppress a lot of build warnings:

    CFLAGS=-O2 -Wno-long-double
    
    To the RCCOBJS list, add pdp11 object like so:
    RCCOBJS=$Balloc$O \
    	$Bbind$O \
    	... many lines omitted ...
    	$Bx86$O \
    	$Bx86linux$O \
    	$Bpdp11$O
    

    (All lines in the list should end with '\' except the last.)

    After the $Bx86linux$O: line, add a line to compile pdp11.c:

    $Bx86linux$O:   $Bx86linux.c;   $(CC) $(CFLAGS) -c -Isrc -o $@ $Bx86linux.c
    $Bpdp11$O:	$Bpdp11.c;	$(CC) $(CFLAGS) -c -Isrc -o $@ $Bpdp11.c
    

    And similarly to process pdp11.md:

    $Bx86linux.c:   $Blburg$E src/x86linux.md; $Blburg src/x86linux.md $@
    $Bpdp11.c:	$Blburg$E src/pdp11.md; $Blburg src/pdp11.md $@
    

    Modify src/bind.c after the line for bytecode:

    xx(bytecode,     bytecodeIR) \
    xx(pdp11,    pdp11IR) \
    

    It may be necessary to remove keyword "static" from line 80 of src/bytecode.c, because pdp11.md uses the dumptree call to aid debugging.

    void dumptree(Node p) {
    

    Also, a custom "HOSTFILE" for the platform is required. My etc/pdp11.c is unfinished, but sufficient to allow cross-compile testing of the back-end. It belongs in the etc subdirectory of the lcc-4.2 directory. I also create a pdp11 directory where the objects and executables live. A build session looks like this:

    mkdir pdp11
    export LCCDIR=`pwd`/pdp11
    export BUILDDIR=$LCCDIR
    
    HOSTFILE=etc/pdp11.c make lcc
    make all
    
    pdp11/lcc -S test.c -target=pdp11
    

    (If you build the compiler under OS X and get "Bad system call" when you try to run it, I can't help you. Workaround: use Linux.)

    Sample MACRO-11 listing

          7 000000                          COUNTBITS:
          8 000000  010546                  MOV FP,-(SP) ; SAVE FRAME POINTER
          9 000002  010605                  MOV SP,FP
         10 000004  010246                  MOV R2,-(SP) ; SAVE
         11 000006  010346                  MOV R3,-(SP) ; SAVE
         12                                 ;
         13 000010  016503  000004          MOV 4(FP),R3 ; REG <- OPD
         14 000014  005002                  CLR R2
         15 000016  000406                  BR 5$
         16 000020                          2$:
         17 000020  032703  000001          BIT  #1,R3
         18 000024  001401                  BEQ 6$
         19 000026  005202                  INC R2
         20 000030                          6$:
         21 000030                          3$:
         22 000030  000241                  CLC
         23 000032  006003                  ROR R3
         24 000034                          5$:
         25 000034  005703                  TST R3
         26 000036  001370                  BNE 2$
         27 000040  010200                  MOV R2,R0 ; LOADU2
         28 000042                          1$:
         29                                 ;
         30 000042  012603                  MOV (SP)+,R3 ; RESTORE
         31 000044  012602                  MOV (SP)+,R2 ; RESTORE
         32 000046  012605                  MOV (SP)+,FP ; RESTORE FRAME POINTER
         33 000050  000207                  RTS PC
    

    (Here's a longer example, including a test run under RT-11.)