amazon web services – AI News

AWS announces nine major updates for its ML platform SageMaker

Lead Editor — Wed, 09 Dec 2020 14:47:48 +0000

Amazon Web Services (AWS) has announced nine major new updates for its cloud-based machine learning platform, SageMaker.

SageMaker aims to provide a machine learning service which can be used to build, train, and deploy ML models for virtually any use case.

During this year’s re:Invent conference, AWS made several announcements to further improve SageMaker’s capabilities.

Swami Sivasubramanian, VP of Amazon Machine Learning at AWS, said:

“Hundreds of thousands of everyday developers and data scientists have used our industry-leading machine learning service, Amazon SageMaker, to remove barriers to building, training, and deploying custom machine learning models. One of the best parts about having such a widely-adopted service like SageMaker is that we get lots of customer suggestions which fuel our next set of deliverables.
Today, we are announcing a set of tools for Amazon SageMaker that makes it much easier for developers to build end-to-end machine learning pipelines to prepare, build, train, explain, inspect, monitor, debug, and run custom machine learning models with greater visibility, explainability, and automation at scale.”

The first announcement is Data Wrangler, a feature which aims to automate the preparation of data for machine learning.

Data Wrangler enables customers to choose the data they want from their various data stores and import it with a single click. Over 300 built-in data transformers are included to help customers normalise, transform, and combine features without having to write any code.

Frank Farrall, Principal of AI Ecosystems and Platforms Leader at Deloitte, comments:

“SageMaker Data Wrangler enables us to hit the ground running to address our data preparation needs with a rich collection of transformation tools that accelerate the process of machine learning data preparation needed to take new products to market.
In turn, our clients benefit from the rate at which we scale deployments, enabling us to deliver measurable, sustainable results that meet the needs of our clients in a matter of days rather than months.”

The second announcement is Feature Store. Amazon SageMaker Feature Store provides a new repository that makes it easy to store, update, retrieve, and share machine learning features for training and inference.

Feature Store aims to overcome the problem of storing features which are mapped to multiple models. A purpose-built feature store helps developers to access and share features that make it much easier to name, organise, find, and share sets of features among teams of developers and data scientists. Because it resides in SageMaker Studio – close to where ML models are run – AWS claims it provides single-digit millisecond inference latency.

Mammad Zadeh, VP of Engineering, Data Platform at Intuit, says:

“We have worked closely with AWS in the lead up to the release of Amazon SageMaker Feature Store, and we are excited by the prospect of a fully managed feature store so that we no longer have to maintain multiple feature repositories across our organization.
Our data scientists will be able to use existing features from a central store and drive both standardisation and reuse of features across teams and models.”

Next up, we have SageMaker Pipelines—which claims to be the first purpose-built, easy-to-use continuous integration and continuous delivery (CI/CD) service for machine learning.

Developers can define each step of an end-to-end machine learning workflow including the data-load steps, transformations from Amazon SageMaker Data Wrangler, features stored in Amazon SageMaker Feature Store, training configuration and algorithm set up, debugging steps, and optimisation steps.

SageMaker Clarify may be one of the most important features being debuted by AWS this week considering ongoing events.

Clarify aims to provide bias detection across the machine learning workflow, enabling developers to build greater fairness and transparency into their ML models. Rather than turn to often time-consuming open-source tools, developers can use the integrated solution to quickly try and counter any bias in models.

Andreas Heyden, Executive VP of Digital Innovations for the DFL Group, says:

“Amazon SageMaker Clarify seamlessly integrates with the rest of the Bundesliga Match Facts digital platform and is a key part of our long-term strategy of standardising our machine learning workflows on Amazon SageMaker.
By using AWS’s innovative technologies, such as machine learning, to deliver more in-depth insights and provide fans with a better understanding of the split-second decisions made on the pitch, Bundesliga Match Facts enables viewers to gain deeper insights into the key decisions in each match.”

Deep Profiling for Amazon SageMaker automatically monitors system resource utilisation and provides alerts where required for any detected training bottlenecks. The feature works across frameworks (PyTorch, Apache MXNet, and TensorFlow) and collects system and training metrics automatically without requiring any code changes in training scripts.

Next up, we have Distributed Training on SageMaker which AWS claims makes it possible to train large, complex deep learning models up to two times faster than current approaches.

Kristóf Szalay, CTO at Turbine, comments:

“We use machine learning to train our in silico human cell model, called Simulated Cell, based on a proprietary network architecture. By accurately predicting various interventions on the molecular level, Simulated Cell helps us to discover new cancer drugs and find combination partners for existing therapies.
Training of our simulation is something we continuously iterate on, but on a single machine each training takes days, hindering our ability to iterate on new ideas quickly.
We are very excited about Distributed Training on Amazon SageMaker, which we are expecting to decrease our training times by 90% and to help us focus on our main task: to write a best-of-the-breed codebase for the cell model training.
Amazon SageMaker ultimately allows us to become more effective in our primary mission: to identify and develop novel cancer drugs for patients.”

SageMaker’s Data Parallelism engine scales training jobs from a single GPU to hundreds or thousands by automatically splitting data across multiple GPUs, improving training time by up to 40 percent.

With edge computing advancements increasing rapidly, AWS is keeping pace with SageMaker Edge Manager.

Edge Manager helps developers to optimise, secure, monitor, and maintain ML models deployed on fleets of edge devices. In addition to helping optimise ML models and manage edge devices, Edge Manager also provides the ability to cryptographically sign models, upload prediction data from devices to SageMaker for monitoring and analysis, and view a dashboard which tracks and provided a visual report on the operation of the deployed models within the SageMaker console.

Igor Bergman, VP of Cloud and Software of PCs and Smart Devices at Lenovo, comments:

“SageMaker Edge Manager will help eliminate the manual effort required to optimise, monitor, and continuously improve the models after deployment. With it, we expect our models will run faster and consume less memory than with other comparable machine-learning platforms.
As we extend AI to new applications across the Lenovo services portfolio, we will continue to require a high-performance pipeline that is flexible and scalable both in the cloud and on millions of edge devices. That’s why we selected the Amazon SageMaker platform. With its rich edge-to-cloud and CI/CD workflow capabilities, we can effectively bring our machine learning models to any device workflow for much higher productivity.”

Finally, SageMaker JumpStart aims to make it easier for developers which have little experience with machine learning deployments to get started.

JumpStart provides developers with an easy-to-use, searchable interface to find best-in-class solutions, algorithms, and sample notebooks. Developers can select from several end-to-end machine learning templates(e.g. fraud detection, customer churn prediction, or forecasting) and deploy them directly into their SageMaker Studio environments.

AWS has been on a roll with SageMaker improvements—delivering more than 50 new capabilities over the past year. After this bumper feature drop, we probably shouldn’t expect any more until we’ve put 2020 behind us.

You can find coverage of AWS’ more cloud-focused announcements via our sister publication CloudTech here.

The post AWS announces nine major updates for its ML platform SageMaker appeared first on AI News.

NVIDIA chucks its MLPerf-leading A100 GPU into Amazon’s cloud

Lead Editor — Tue, 03 Nov 2020 15:55:37 +0000

NVIDIA’s A100 set a new record in the MLPerf benchmark last month and now it’s accessible through Amazon’s cloud.

Amazon Web Services (AWS) first launched a GPU instance 10 years ago with the NVIDIA M2050. It’s rather poetic that, a decade on, NVIDIA is now providing AWS with the hardware to power the next generation of groundbreaking innovations.

The A100 outperformed CPUs in this year’s MLPerf by up to 237x in data centre inference. A single NVIDIA DGX A100 system – with eight A100 GPUs – provides the same performance as nearly 1,000 dual-socket CPU servers on some AI applications.

“We’re at a tipping point as every industry seeks better ways to apply AI to offer new services and grow their business,” said Ian Buck, Vice President of Accelerated Computing at NVIDIA, following the benchmark results.

Businesses can access the A100 in AWS’ P4d instance. NVIDIA claims the instances reduce the time to train machine learning models by up to 3x with FP16 and up to 6x with TF32 compared to the default FP32 precision.

Each P4d instance features eight NVIDIA A100 GPUs. If even more performance is required, customers are able to access over 4,000 GPUs at a time using AWS’s Elastic Fabric Adaptor (EFA).

Dave Brown, Vice President of EC2 at AWS, said:

“The pace at which our customers have used AWS services to build, train, and deploy machine learning applications has been extraordinary. At the same time, we have heard from those customers that they want an even lower-cost way to train their massive machine learning models.
Now, with EC2 UltraClusters of P4d instances powered by NVIDIA’s latest A100 GPUs and petabit-scale networking, we’re making supercomputing-class performance available to virtually everyone, while reducing the time to train machine learning models by 3x, and lowering the cost to train by up to 60% compared to previous generation instances.”

P4d supports 400Gbps networking and makes use of NVIDIA’s technologies including NVLink, NVSwitch, NCCL, and GPUDirect RDMA to further accelerate deep learning training workloads.

Some of AWS’ customers across various industries have already begun exploring how the P4d instance can help their business.

Karley Yoder, VP & GM of Artificial Intelligence at GE Healthcare, commented:

“Our medical imaging devices generate massive amounts of data that need to be processed by our data scientists. With previous GPU clusters, it would take days to train complex AI models, such as Progressive GANs, for simulations and view the results.
Using the new P4d instances reduced processing time from days to hours. We saw two- to three-times greater speed on training models with various image sizes while achieving better performance with increased batch size and higher productivity with a faster model development cycle.”

For an example from a different industry, the research arm of Toyota is exploring how P4d can improve their existing work in developing self-driving vehicles and groundbreaking new robotics.

“The previous generation P3 instances helped us reduce our time to train machine learning models from days to hours,” explained Mike Garrison, Technical Lead of Infrastructure Engineering at Toyota Research Institute.

“We are looking forward to utilizing P4d instances, as the additional GPU memory and more efficient float formats will allow our machine learning team to train with more complex models at an even faster speed.”

P4d instances are currently available in the US East (N. Virginia) and US West (Oregon) regions. AWS says further availability is planned soon.

You can find out more about P4d instances and how to get started here.

The post NVIDIA chucks its MLPerf-leading A100 GPU into Amazon’s cloud appeared first on AI News.