benchmark – AI News

NVIDIA chucks its MLPerf-leading A100 GPU into Amazon’s cloud

Lead Editor — Tue, 03 Nov 2020 15:55:37 +0000

NVIDIA’s A100 set a new record in the MLPerf benchmark last month and now it’s accessible through Amazon’s cloud.

Amazon Web Services (AWS) first launched a GPU instance 10 years ago with the NVIDIA M2050. It’s rather poetic that, a decade on, NVIDIA is now providing AWS with the hardware to power the next generation of groundbreaking innovations.

The A100 outperformed CPUs in this year’s MLPerf by up to 237x in data centre inference. A single NVIDIA DGX A100 system – with eight A100 GPUs – provides the same performance as nearly 1,000 dual-socket CPU servers on some AI applications.

“We’re at a tipping point as every industry seeks better ways to apply AI to offer new services and grow their business,” said Ian Buck, Vice President of Accelerated Computing at NVIDIA, following the benchmark results.

Businesses can access the A100 in AWS’ P4d instance. NVIDIA claims the instances reduce the time to train machine learning models by up to 3x with FP16 and up to 6x with TF32 compared to the default FP32 precision.

Each P4d instance features eight NVIDIA A100 GPUs. If even more performance is required, customers are able to access over 4,000 GPUs at a time using AWS’s Elastic Fabric Adaptor (EFA).

Dave Brown, Vice President of EC2 at AWS, said:

“The pace at which our customers have used AWS services to build, train, and deploy machine learning applications has been extraordinary. At the same time, we have heard from those customers that they want an even lower-cost way to train their massive machine learning models.
Now, with EC2 UltraClusters of P4d instances powered by NVIDIA’s latest A100 GPUs and petabit-scale networking, we’re making supercomputing-class performance available to virtually everyone, while reducing the time to train machine learning models by 3x, and lowering the cost to train by up to 60% compared to previous generation instances.”

P4d supports 400Gbps networking and makes use of NVIDIA’s technologies including NVLink, NVSwitch, NCCL, and GPUDirect RDMA to further accelerate deep learning training workloads.

Some of AWS’ customers across various industries have already begun exploring how the P4d instance can help their business.

Karley Yoder, VP & GM of Artificial Intelligence at GE Healthcare, commented:

“Our medical imaging devices generate massive amounts of data that need to be processed by our data scientists. With previous GPU clusters, it would take days to train complex AI models, such as Progressive GANs, for simulations and view the results.
Using the new P4d instances reduced processing time from days to hours. We saw two- to three-times greater speed on training models with various image sizes while achieving better performance with increased batch size and higher productivity with a faster model development cycle.”

For an example from a different industry, the research arm of Toyota is exploring how P4d can improve their existing work in developing self-driving vehicles and groundbreaking new robotics.

“The previous generation P3 instances helped us reduce our time to train machine learning models from days to hours,” explained Mike Garrison, Technical Lead of Infrastructure Engineering at Toyota Research Institute.

“We are looking forward to utilizing P4d instances, as the additional GPU memory and more efficient float formats will allow our machine learning team to train with more complex models at an even faster speed.”

P4d instances are currently available in the US East (N. Virginia) and US West (Oregon) regions. AWS says further availability is planned soon.

You can find out more about P4d instances and how to get started here.

The post NVIDIA chucks its MLPerf-leading A100 GPU into Amazon’s cloud appeared first on AI News.

NVIDIA sets another AI inference record in MLPerf

Lead Editor — Thu, 22 Oct 2020 09:16:41 +0000

NVIDIA has set yet another record for AI inference in MLPerf with its A100 Tensor Core GPUs.

MLPerf consists of five inference benchmarks which cover the main three AI applications today: image classification, object detection, and translation.

“Industry-standard MLPerf benchmarks provide relevant performance data on widely used AI networks and help make informed AI platform buying decisions,” said Rangan Majumder, VP of Search and AI at Microsoft.

Last year, NVIDIA led all five benchmarks for both server and offline data centre scenarios with its Turing GPUs. A dozen companies participated.

23 companies participated in this year’s MLPerf but NVIDIA maintained its lead with the A100 outperforming CPUs by up to 237x in data centre inference.

For perspective, NVIDIA notes that a single NVIDIA DGX A100 system – with eight A100 GPUs – provides the same performance as nearly 1,000 dual-socket CPU servers on some AI applications.

“We’re at a tipping point as every industry seeks better ways to apply AI to offer new services and grow their business,” said Ian Buck, Vice President of Accelerated Computing at NVIDIA.

“The work we’ve done to achieve these results on MLPerf gives companies a new level of AI performance to improve our everyday lives.”

The widespread availability of NVIDIA’s AI platform through every major cloud and data centre infrastructure provider is unlocking huge potential for companies across various industries to improve their operations.

The post NVIDIA sets another AI inference record in MLPerf appeared first on AI News.

Nvidia comes out on top in first MLPerf inference benchmarks

Lead Editor — Thu, 07 Nov 2019 11:19:57 +0000

The first benchmark results from the MLPerf consortium have been released and Nvidia is a clear winner for inference performance.

For those unaware, inference takes a deep learning model and processes incoming data however it’s been trained to.

MLPerf is a consortium which aims to provide “fair and useful” standardised benchmarks for inference performance. MLPerf can be thought of as doing for inference what SPEC does for benchmarking CPUs and general system performance.

The consortium has released its first benchmarking results, a painstaking effort involving over 30 companies and over 200 engineers and practitioners. MLPerf’s first call for submissions led to over 600 measurements spanning 14 companies and 44 systems.

However, for datacentre inference, only four of the processors are commercially-available:

Intel Xeon P9282
Habana Goya
Google TPUv3
Nvidia Turing

Nvidia wasted no time in boasting of its performance beating the three other processors across various neural networks in both server and offline scenarios:

The easiest direct comparisons are possible in the ImageNet ResNet-50 v1.6 offline scenario where the greatest number of major players and startups submitted results.

In that scenario, Nvidia once again boasted the best performance on a per-processor basis with its Titan RTX GPU. Despite the 2x Google Cloud TPU v3-8 submission using eight Intel Skylake processors, it had a similar performance to the SCAN 3XS DBP T496X2 Fluid which used four Titan RTX cards (65,431.40 vs 66,250.40 inputs/second).

Ian Buck, GM and VP of Accelerated Computing at NVIDIA, said:

“AI is at a tipping point as it moves swiftly from research to large-scale deployment for real applications.
AI inference is a tremendous computational challenge. Combining the industry’s most advanced programmable accelerator, the CUDA-X suite of AI algorithms and our deep expertise in AI computing, NVIDIA can help datacentres deploy their large and growing body of complex AI models.”

However, it’s worth noting that the Titan RTX doesn’t support ECC memory so – despite its sterling performance – this omission may prevent its use in some datacentres.

Another interesting takeaway when comparing the Cloud TPU results against Nvidia is the performance difference when moving from offline to server scenarios.

Google Cloud TPU v3 offline: 32,716.00
Google Cloud TPU v3 server: 16,014.29
Nvidia SCAN 3XS DBP T496X2 Fluid offline: 66,250.40
Nvidia SCAN 3XS DBP T496X2 Fluid server: 60,030.57

As you can see, the Cloud TPU system performance is slashed by over a half when used in a server scenario. The SCAN 3XS DBP T496X2 Fluid system performance only drops around 10 percent in comparison.

You can peruse MLPerf’s full benchmark results here.

Interested in hearing industry leaders discuss subjects like this? , , , AI &

The post Nvidia comes out on top in first MLPerf inference benchmarks appeared first on AI News.

Esteemed consortium launch AI natural language processing benchmark

Lead Editor — Thu, 15 Aug 2019 16:24:15 +0000

A research consortium featuring some of the greatest minds in AI are launching a benchmark to measure natural language processing (NLP) abilities.

The consortium includes Google DeepMind, Facebook AI, New York University, and the University of Washington. Each of the consortium’s members believe a more comprehensive benchmark is needed for NLP than current solutions.

The result is a benchmarking platform called SuperGLUE which replaces an older platform called GLUE with a “much harder benchmark with comprehensive human baselines,” according to Facebook AI.

SuperGLUE helps to put NLP abilities to the test where previous benchmarks were beginning to pose too simple for the latest systems.

“Within one year of release, several NLP models have already surpassed human baseline performance on the GLUE benchmark. Current models have advanced a surprisingly effective recipe that combines language model pretraining on huge text data sets with simple multitask and transfer learning techniques,” Facebook said.

In 2018, Google released BERT (Bidirectional Encoder Representations from Transformers) which Facebook calls one of the biggest breakthroughs in NLP. Facebook took Google’s open-source work and identified changes to improve its effectiveness which led to RoBERTa (Robustly Optimized BERT Pretraining Approach).

RoBERTa basically “smashed it,” as the kids would say, in commonly-used benchmarks:

“Within one year of release, several NLP models (including RoBERTa) have already surpassed human baseline performance on the GLUE benchmark. Current models have advanced a surprisingly effective recipe that combines language model pretraining on huge text data sets with simple multitask and transfer learning techniques,” Facebook explains.

For the SuperGLUE benchmark, the consortium decided on tasks which meet four criteria:

Have varied formats.
Use more nuanced questions.
Are yet-to-be-solved using state-of-the-art methods.
Can be easily solved by people.

The new benchmark includes eight diverse and challenging tasks, including a Choice of Plausible Alternatives (COPA) causal reasoning task. The aforementioned task provides the system with the premise of a sentence and it must determine either the cause or effect of the premise from two possible choices. Humans have managed to achieve 100 percent accuracy on COPA while BERT achieves just 74 percent.

Across SuperGLUE’s tasks, RoBERTa is currently the leading NLP system and isn’t far behind the human baseline:

You can find a full breakdown of SuperGLUE and its various benchmarking tasks in a Facebook AI blog post here.

deepgeniusai.com/">AI & Big Data Expo events with upcoming shows in Silicon Valley, London, and Amsterdam to learn more. Co-located with the IoT Tech Expo, , & .

The post Esteemed consortium launch AI natural language processing benchmark appeared first on AI News.

AnTuTu’s latest benchmark tests AI chip performance

Lead Editor — Wed, 30 Jan 2019 12:28:08 +0000

We can now better scrutinise manufacturers’ claims about AI chip performance improvements thanks to AnTuTu’s latest benchmark.

If you’ve ever read a comprehensive smartphone review, you’ve likely heard of AnTuTu. The company’s smartphone benchmarking tool is often used for testing and comparing the CPU and 3D performance of devices.

With dedicated AI chips now appearing in devices from the mid-range to flagships, AnTuTu has decided it’s time for a benchmark to determine their performance.

In a blog post, AnTuTu says its benchmark uses two categories – ‘Image Classification’, and ‘Object Recognition’.

AI News tested AnTuTu’s benchmark on a Huawei Mate 20 Pro which currently ranks second on AnTuTu’s general performance leaderboard for Android devices. Huawei often brags about the AI performance of its flagship devices.

The first test classifies 200 images as fast as possible using the Inception v3 neural network:

In the second test, 600-frame video is reviewed using the MobileNet SSD neural network:

AnTuTu then delivers an overall benchmark score, along with the scores for each category.

Here is how our Mate 20 Pro fared:

Overall – 65,222
Image Classification – 41,717
Object Detection – 23,505

Each of the categories is further broken down into scores for ‘speed’ and ‘accuracy’. If accuracy is traded for speed, then a lower score will be given.

AnTuTu says this helps to prevent cheating by devices processing the data fast but without providing the right answers. Smartphone manufacturers have been caught artificially-inflating their benchmarks in the past; so it provides added confidence in the results.

For a general look at the AI features in the Mate 20 Pro, see our video below:

You can download the AI benchmark from AnTuTu here.

deepgeniusai.com/">AI & Big Data Expo events with upcoming shows in Silicon Valley, London, and Amsterdam to learn more. Co-located with the IoT Tech Expo, , & .

The post AnTuTu’s latest benchmark tests AI chip performance appeared first on AI News.