Modern Problems
for the Smalltalk VM

Boris Shingarov

www.shingarov.com

The need for new scopes

  • Fundamental design of Smalltalk VM stable over years
  • New scopes = major source of new progress
  • New classes of complexity get in the way of gaining insight

VM workload complexity

  • Typical modern VM bug
    • parallelism, race conditions, ...
    • aggressive optimiations by today's compilers
  • Overall: can't observe the VM using uniprocessor-era approaches

The need to use models

  • Developing on the actual microprocessor is a bad idea
  • Use formal models of the target progessors, tools for reasoning about models, make the tools aware of Smalltalk

VM target complexity

  • Post-uniprocessor-era (SoC, multi/manycore, ASIP) very different from uniprocessor designs
  • Power wall / Memory wall / ILP wall / Design cost wall / Software Legacy Wall
  • 3 - 4 general-purpose ISAs → SoC design specifics; dozens of ASIP processors
  • ARM ARM v.8A.a: 5158 pages of natural language
  • Processor designer's understanding diverges from the compiler backend (in our case, JIT) designer's understanding
  • "Portability" = impossible

Synthesis from common PDL

  • Processor Description Languages
  • Invented by Zimmerman and Marwedel at the Kiel Radioastronomical Observatory in the 1970s
  • Many available to choose from, considering tradeoffs
  • PDL → HDL → Silicon
  • PDL → HDL → FPGA
  • PDL → Simulator "Smalltalk-aware"
  • PDL → Smalltalk JIT

Experiment 1: PDL → Smalltalk JIT

  • Start from ArchC PDL
  • Compile PDL to "Machine Tables"
  • Limited Cattell's algorithm for instruction selection
  • Existing ArchC descriptions of SPARC, ARM, PowerPC, MIPS
  • No need to understand the whole ISA (+ABI, +optimization manual...)
  • Automatic retargeting of the JIT

Experiment 1 — results

  • Processor-agnostic JIT, automatically retargets to the 4 available ISAs, partially implemented
  • Problem 1: extracting the instruction semantics causes potential internal divergence between ArchC and ACCGen processor descriptions
  • Problem 2: ABI is not modeled as part of the processor description

Experiment 2: Smalltalk-aware simulation

  • Full-System Simulators
  • New classes of insight into program execution
  • Repeatable and Reversible
  • Out-of-band observation
  • Modeling of microarchitectural phenomena
    • MAI
    • TFSim
  • Limitation 1: the Python implementation does not automatically reflect VM code (e.g. data structures): a C modulerecompiled with VM headers would provide reflection

For illustration...

  • To get ourselves into the simulator's interface at some interesting point in the running JIT
  • A "Magic Instruction" causes simulation breakpoint
  • Example of Program-Simulator signalization
  • Simics Magic Instructions on different architectures:
Target Magic instruction
x86 xchg %bx, %bx encoding: 66 87 DB
ARM orreq rn, rn, rn 0 <= n < 15
PowerPC 32-bitmr n, n 0 <= n < 32
PowerPC 64-bit fmr n, n 0 <= n < 32
SPARC sethi n, %g0 1 <= n < 0x400000

Modify the JIT translator

  • Illustration — continued
  • NB: No inherent need for any intrusive [in-band] modification
  • CogIA32Compiler>concretizeMagic
  • CogIA32Compiler>dispatchConcretize
  • CogRTLOpcodes>>initialize
    Add "Magic" to the end and send #initialize. Now our abstract RTL has the magic instruction.
  • Cogit>>Magic
     <inline:true>
     <returnTypeC:#'AbstractInstruction*'>
     ^self gen: Magic

Use the instruction somewhere...

Once the new instruction is defined, we can modify the JIT to emit it. For illustration purposes,
genGetClassFormatOfNonInt: instReg
    into: destReg
    scratchReg: scratchReg
  "Fetch the instance's class format into destReg,
   assuming the object is non-int."
  | jumpCompact jumpGotClass |
  <var: #jumpCompact type: #'AbstractInstruction *'>
  <var: #jumpGotClass type: #'AbstractInstruction *'>
  cogit Magic. "THIS WILL STOP SIMULATION"
  "Get header word in destReg"
  cogit MoveMw: 0 r: instReg R: destReg.
  ... "rest of method"

Looking at the VM's state

simics> pregs
32-bit legacy protected mode
eax = 0x00000001, ax = 0x0001, ah = 0x00, al = 0x01
ecx = 0x00000006, cx = 0x0006, ch = 0x00, cl = 0x06
edx = 0x944f9540, dx = 0x9540, dh = 0x95, dl = 0x40
ebx = 0x00000004, bx = 0x0004, bh = 0x00, bl = 0x04
esp = 0xbf869670, sp = 0x9670
ebp = 0xbf869680, bp = 0x9680
esi = 0x00000003, si = 0x0003
edi = 0x00000003, di = 0x0003

eip = 0x93cff260, linear = 0x93cff260

eflags = 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 = 0x00000202
receiver in %EDX

Looking at the VM's state (cont.)

simics> x l:0x944f9540 4
l:0x944f9540  1d04 9a12      ← receiver's object header

class oop in header word 2, offset -4:

simics> x l:0x944f953C 4
l:0x944f9530     813e 1194    ← class oop

A rudimentary Smalltalk module

Look at the object memory using Simics Python API

def print_class_of_oop(oop):
    if ((oop & 1)==1):
       print "SmallInteger"
    else:
       headerType = smalltalk_headerType(oop)
       if (headerType==3):
          print "...looks like compact class..."
       else:
          word2 = read_virt_value(oop-4, 4)
          classOop = word2&0xFFFFFFFC
          print "class oop: ", hex(classOop)
          classNameOop = read_virt_value(classOop+32, 4)
          print "class name oop: ", hex(classNameOop)
          str=""
          for offset in range(smalltalk_objByteSize(classNameOop)):
              str += "%c" % read_virt_value(classNameOop + 4 + offset, 1)
          print str

Make it into a command...

new_command("print-class-of-oop", print_class_of_oop,
            [arg(int_t, "oop")],
            type = "Debugging",
            see_also = [],
            short = "describe an oop",
            doc = """
Print the class of oop.""")

Try printing classes of some OOPs...

simics> print-class-of-oop %edx
class oop:  0x94113E80L
class name oop:  0x93F401CCL
WeakAnnouncementSubscription
simics> print-class-of-oop 0x93F401CC
class oop:  0x940FB4B4L
class name oop:  0x93E94140L
ByteSymbol

Experiment 3

Smalltalk instrumentation of an actual processor

  • Existing instrumentation interfaces: EJTAG (MIPS), ETM (ARM)
  • The above are not available to us
  • OpenSPARC
  • ...but not quite:
  • the FPGA implementation offsets major parts from Verilog to software running on service processor (CCX, FPU, ...)

Experiment 3 (cont.)

  • all memory access through the CCX
  • CCX written in C; can take VM headers to parse memory and reason about what the VM is doing on the running processor