Faster and Lighter Model Inference with ONNX Runtime from Cloud to Client

>> YOUR LINK HERE: ___ http://youtube.com/watch?v=WDww8ce12Mc

ONNX Runtime is a high-performance inferencing and training engine for machine learning models. This show focuses on ONNX Runtime for model inference. ONNX Runtime has been widely adopted by a variety of Microsoft products including Bing, Office 365 and Azure Cognitive Services, achieving an average of 2.9x inference speedup. Now we are glad to introduce ONNX Runtime quantization and ONNX Runtime mobile for further accelerating model inference with even smaller model size and runtime size. ONNX Runtime keeps evolving not only for cloud-based inference but also for on-device inference. • Jump To: • [00:00] Livestream begins • [01:02] ONNX and ONNX Runtime overview https://aka.ms/AIShow/ONNXRuntimeGH • [02:26] model operationalization with ONNX Runtime • [04:04] ONNX Runtime adoption • [05:07] ONNX Runtime INT8 quantization for model size reduction and inference speedup • [09:46] Demo of ONNX Runtime INT8 quantization • [16:00] ONNX Runtime mobile for runtime size reduction • Learn More: • ONNX Runtime https://aka.ms/AIShow/ONNXRuntimeGH • Faster and smaller quantized NLP with Hugging Face and ONNX Runtime https://aka.ms/AIShow/QuantizedNLP • ONNX Runtime for Mobile Platforms https://aka.ms/AIShow/RuntimeforMobil... • ONNX Runtime Inference on Azure Machine Learning https://aka.ms/AIShow/RuntimeInferenc... • Follow ONNX AI / onnxai • Follow ONNX Runtime / onnxruntime • Create a Free account (Azure) https://aka.ms/aishow-seth-azurefree • Deep Learning vs. Machine Learning https://aka.ms/AIShow/DLvML • Get Started with Machine Learning https://aka.ms/AIShow/StartML • Follow Seth / sethjuarez • Don't miss new episodes, subscribe to the AI Show https://aka.ms/aishowsubscribe

#############################

New on site