grant-tracking/nlnet-2024-12-324/progress.md

8.2 KiB

NLnet 2024-12-324 Grant Progress

€ 50000 Libre-Chip's First CPU Architecture And Formal Proof of No Spectre bugs

Modern computers suffer from a constant stream of new speculative-execution security flaws (Spectre-style bugs). To address this major category of flaws, we are working towards building a high-performance computer processor (CPU) with speculative execution and working on a mathematical proof that it doesn't suffer from any speculative-execution data leaks, thereby demonstrating that this major category of flaws can be eliminated without crippling the computer's performance.

https://en.wikipedia.org/wiki/Transient_execution_CPU_vulnerability#Timeline

€ 6000 Adding Rocq output in Fayalite

This is for adding the code for translating Fayalite HDL to Rocq, as well as determining how exactly we'll describe HDL in Rocq. I expect the translation code to be of comparable size to the compiler portion of the simulator (the simulator is broken into three main parts, a compiler to an IR optimized for interpretation, the interpreter itself, and the code for reading/writing simulator I/O and handling time simulation), so somewhere around 5000 LoC.

  • € 2000 Issue #N Figure out how exactly we should represent HDL in Rocq, writing down a manually-translated version of common HDL components (e.g. how to translate a register, a memory, an add/sub/mul/div, etc.).
  • € 4000 Issue #N Write the code to do the translation in Fayalite.

€ 4000 Adding supporting code for generating FPGA bitstreams from Fayalite

This is for adding the tooling to run all the right programs to generate FPGA bitstreams, as well as adding code to handle connecting I/O ports to the FPGA pins. I expect this to be on the order of 1000-2000 LoC for the FPGA pins code, as well as a few hundred for running all the right programs in sequence.

  • € 2000 Issue #N Write support for board interface descriptions and the code for running the FPGA toolchain (similar to the existing code for running SymbiYosys -- the current formal verification toolchain).
  • € 1000 Issue #N Add support for the Orange Crab since both Cesar and Jacob have one.
  • € 1000 Issue #N Add support for the Arty A7 100T since that's what we're using for CI.

€ 10000 Register Renaming, Execution, and Instruction Retire

This covers getting register renaming working, as well as scheduling, executing simple ALU and Branch instructions, and properly handling instruction retire. (Some of that work is already done.)

A lot of this is the work to come up with a detailed low-level plan for the CPU, so I don't have a good idea of how complex or not this is, though I expect it to be probably 40% of the CPU's complexity.

  • € 1500 Issue #N Add to the simulator in Fayalite the ability to transfer non-HDL data (e.g. HashMap) through the digital signalling mechanism, this allows using those data types when writing procedural models.
  • € 6000 Issue #N Create a model of the whole rename/execute/retire control system, using procedural implementations of the most complex HDL modules where appropriate.
  • € 2500 Issue #N Translate the procedural model to use actual synthesizeable HDL. includes a proof of correctness of the out-of-order CPU in relation to a sequential CPU (probably most easily done by adding the proof to the retire stage).

€ 8000 Instruction Fetch/Decode

This covers instruction fetch, decoding, and caching. For the decoder, unless OpenPower has gotten around to releasing the Latex source code, I'm expecting to use a parser I wrote that parses the instruction descriptions out of the PowerISA v3.1C PDF and writes out XML.

https://git.libre-chip.org/libre-chip/parse_powerisa_pdf

  • € 1000 Issue #N Create the next-instruction logic -- includes some sort of branch prediction or branch target buffer so we can actually keep the rest of the CPU pipeline full. This should support fetching more than one instruction per clock.
  • € 1000 Issue #N Create the fetch and i-cache logic.
  • € 2000 Issue #N Create the PowerISA decoder -- it translates to the internal microcode. For now, only needs to support a reasonable subset of 64-bit LE integer instructions in problem mode (aka. user mode), FP and VMX/VSX can be disabled.
  • € 2000 Issue #N Create a model of the instruction fetch/decode control system, using procedural implementations of the most complex HDL modules where appropriate.
  • € 2000 Issue #N Translate the procedural model to use actual synthesizeable HDL.

€ 10000 Load/Store instructions

This covers implementing the load/store hierarchy, including an L1 cache. For now, the CPU will only target on-FPGA memory blocks, as well as simple I/O devices. (Support for DRAM can be added at a later point outside of this grant.) It should include d-cache, some kind of memory, and at least one IO device.

It should include at least lr/sc, some atomic fetch-op, cached load/store, and IO load/store (IO needs to wait until non-speculative to start executing).

  • € 1000 Issue #N memory system: main memory and IO devices I'm expecting just a big sram to be good enough for simulation of memory, on the fpga we could probably get away with a relatively small sram and put off a dram interface for later. for the IO device, I'm thinking we'd have a simple fixed-frequency uart.
  • € 1000 Issue #N d-cache
  • € 2500 Issue #N memory load execution unit (we'll want to be able to do more than one load at once)
  • € 2500 Issue #N memory store execution unit
  • € 2000 Issue #N adding atomics: lr/sc, atomic fetch-add (or other fetch-op)
  • € 1000 Issue #N adding order-violation detection logic, so we can make memory look like it has total-store-order (for x86), or even sequential consistency (meaning we can ignore all non-IO fences)

€ 12000 Work towards the Formal Proof of No Spectre bugs

This covers working on the Formal Proof of No Spectre bugs as well as improvements to other software/hardware needed for that. A major portion of this is to figure out what exact properties we need to formally prove for each part of the CPU, so we can put those parts together to make a working proof for the entire CPU. I'm hoping this will be enough work to get nearly all of the proof written out, even if there are some flaws we discover that we'll have to put off fixing for a later grant.

  • € 3000 Issue #N Write Rocq and HDL logic for tracking which instructions will eventually be cancelled and which will eventually be retired.
  • € 9000 Issue #N Attempt Proof that our CPU but with zeroed outputs for all eventually-cancelled instructions is equivalent to our real CPU design. This may need significant modifications to the CPU. This task may be too big and need further subdividing once we've made some progress on the proof so we know where to subdivide it.