Reading#
-
Learning value functions
Neural network weight updates using gradient descent
-
Chapter about RL from the book Arificial intelligence
free book from University of Columbia
if you want a compact and different view than the standard RL book by Sutton
-
an object-oriented approach to linear neural networks
from an open source book about deep learning
does not implement neural networks from scratch, but uses many popular ML libraries
-
implements neural networks from scratch in Python
not free
once prepared slides for a short lecture based on the book. Here is code used in the slides.
-
similar to the last
not free
has many useful animations
Frameworks, Libraries, tools#
-
neural network inference on FPGAs
quantized neural networks
-
even the project is active, no newer boards like PYNQ-Z2 or Alveo U50 are tested.
-
introduces quantized operators for ONNX
-
quantized implementations of the most PyTorch layers, e.g.,
QuantConv1d
enables low-precision arithmetic (8bit, 4bit etc) which reduces DSP and memory footprint usage
-
quantized implementations of Keras layers
e.g.,
smooth_sigmoid(x)
,hard_sigmoid(x)
…includes an energy consumption estimator
-
ONNX model visualization
-
resource and latency estimation for ML on FPGA
-
converts neural network models to FPGA firmware
-
framework for HLSing many configurations of a design and comparing the results
-
Vitis libraries for HLS
-
for flexible AI inference compared to Brevitas & FINN
compiled code is run on an a micro-coded DPU (deep learning processing units)
-
cycle-accurate RAM simulator including HBM
Neural network implementations#
-
SystemVerilog
datatypes are fixed, probably no quantization possible
-
SystemVerilog
datatypes are fixed, probably no quantization possible
-
AI model to Verilog
by IBM, but not maintained
-
common operators in CNN
-
configurable multiplier etc
no documentation
-
sparse attention for LLMs, contains a hardware implementation
includes a dot product implementation
Math libraries#
-
hardware design library based on SpinalHDL
includes a systolic array
-
floating point with user programmable exponent and mantissa
-
a pipelined multiplier in SpinalHDL
Reinforcement learning implementations#
-
RL via DPC++ (SYCL) and Torch
-
three stage pipeline
-
Proximal policy optimization (PPO) implementation
not documented
-
A Toolkit for benchmarking FPGA-accelerated Reinforcement Learning
not maintained
-
Experiences with Vitis AI for deep RL
pruning increases performance
tried only on GPUs, not on FPGAs
-
Vitis HLS based reinforcement learning code
no documentation