In part 2 of this episode on software performance, I show how to embed C code in Python, discuss multi-process software and mult-threaded software, special purpose vectorized instructions and how to use them through Numpy, general purpose GPU programming, Tensor processors, and Amdahl's law which fundamentally limits the gains to be had from parallel processing.