Deep Learning: How compilers help optimize models on hardware

To efficiently use deep learning models on different target hardware, compilers such as TensorFlow XLA and Apache TVM use the LLVM compiler stack.

The LLVM project has not only produced a well-established compiler framework for translating high-level languages into machine languages on different processor architectures. It is also being used more and more to optimize machine learning models.

LLVM’s special intermediate code helps to optimally translate hardware-specific functions and data structures of deep learning models on different target hardware. Both open source and commercial products are vying for the attention of data scientists and custom AI hardware makers.

Unlike high-level languages, machine learning models are not programmed, but trained. Large deep learning models are not trained on the target hardware they are supposed to run on, but on large GPU clusters in the cloud or in the HPC area of your own data center. The model training makes completely different demands on the hardware than the execution of the trained models.