Explore vSphere + Bitfusion – The Future of AI & ML

Explore vSphere + Bitfusion – The Future of AI & ML

This post was originally published on this site ---

We’re back with another vSphere Tweet Chat recap! In this edition, we explore vSphere and Bitfusion, and what this integration means for the future of AI and ML. To answer all of your questions, experts Jim Brogan (@brogan_record) and Don Sullivan (@dfsulliv) joined us to share the inside scoop. They answered some tough questions, and you’ll have to continue reading to find out more. Check out the full chat recap:

Q1: What is Bitfusion?

Jim Brogan

A1: Bitfusion does for GPUs, and hardware accelerators in general, what vSphere did for CPUs (compute).

A1 Part 2: In the AI/ML space, for example, GPUs are underutilized (15% utilization would be representative). Bitfusion is virtualization that lets you share GPUs), plus it has some management capabilities and analytics.

Q2: Why is Bitfusion being brought into vSphere?

Jim Brogan

A2: Any cloud, any device, any apps in today’s world must include ML/AI applications, and the apps need acceleration. Bitfusion ensures ML acceleration gets first-class-citizen treatment when it comes to virtualization and flexibility.

Don Sullivan

A2: Bitfusion is a perfect fit for vSphere and modern customer needs. VMware recognized that Bitfusion can virtualize powerful GPUs in the same way that vSphere virtualizes CPUs. This is a complementary and natural evolution of vSphere.

A2 Part 2: Bitfusion is a perfect fit for vSphere, in that it virtualizes GPUs the same way that vSphere has always virtualized CPUs. BF is a natural evolution for VMware high tech customer needs

Q3: Why do AI & ML applications need GPUs or other hardware acceleration?

Jim Brogan

A3: These apps perform very large numbers of mathematical operations that are easy to compute in parallel but take days on serial processors.

A3 Part 2: GPUs offer this parallelism and can run these applications hundreds of time faster, consuming less power along the way. FPGA and ASIC acceleration are coming soon.

Q4: What are the market-place problems with GPUs & what is Bitfusion trying to solve?

Jim Brogan

A4: GPUs are expensive, they can’t be purchased for all users who want them. The ones that are purchased are underutilized, and it is hard for users to share them.

A4 Part 2: They tend to sit in their own silos (multiple) outside of the resources managed by vSphere admin and other virtualized resources (like compute, storage, and networking).

A4 Part 3: Users get stuck with a single set of resources when they need bigger sets and more variety.

Don Sullivan

A4: BF represents a new threshold of capabilities of vSphere, and the FORCE of vSphere has been expanded greatly.

Q5: Does Bitfusion solve marketplace problems for GPUs and AI/ML apps?

Jim Brogan

A5: Of course! Utilization and efficiency increase because you can share the GPUs. vSphere admin can put an organization’s GPU servers into pools/vCenter-clusters for easy management, but all the users can access what they need.

A5 Part 2: Users no longer have to port their environments to specific hardware or clean up when someone else needs to use it. Users have access to a larger pool of acceleration with wider variety.

Q6: What exactly does virtualization mean for Bitfusion and GPUs?

Jim Brogan

A6: There are two orthogonal types of virtualization, and both are dynamically available, no rebooting or migration required:

A6 Part 2: Remote access of GPUs across the network, enabling sharing by multiple users over time. Splitting GPUs into fractions of any size, enabling shared, concurrent use.

Don Sullivan

A6: That’s perpendicular in multiple directions!!!

Q7: How does Bitfusion virtualization work?

Jim Brogan

A7: Big topic, but we intercept the API calls to the hardware accelerator. This approach avoids complicated work in the hypervisor.

A7 Part 2: This approach means that there are no changes and no recompilation of the application or anything in its stack (frameworks, CUDA, drivers, OS).

A7 Part 3: A good talk on this was given at Tech Field Day during the VMworld conference last August.

Q8: What types of applications or use-cases does Bitfusion address?

James Brogan

A8: AI and ML applications are our primary focus, but any CUDA-using application is a candidate. This includes HPC, wherever the GPU utilization is a concern. Desktop graphics and other non-CUDA-using applications are not a candidate.

Q9: Does Bitfusion provide any management capabilities?

Jim Brogan

A9: Bitfusion tracks, charts and exports GPU allocation, utilization and network traffic on a GPU server and on a CPU client basis. Bitfusion enforces policies that deallocate unused GPUs on a client-by-client basis.

Q10: What are the #Bitfusion network requirements?

Jim Brogan

A10: 10 Gbps; < 50 microseconds latency from the client to the GPU server.

A10 Part 2: At least that is the goal. The latency is desirable for best performance, not for functionality.

Don Sullivan

A10: VMware will be delivering a number of Bitfusion related sessions at various #VMUGs in 2020.

Q11: How can users try out the #Bitfusion software (conduct a POC)?

Jim Brogan

A11: During the period when Bitfusion is being integrated into vSphere, Bitfusion is not for sale. But…you can write to [email protected] and be approved for a POC.

A11 Part 2: We deliver an OVA with the software, licensing, application stack, and dataset with support. You can create your own VMs, stacks and use the same license (but interactive support on this path is very limited).

A11 Part 3: The original Bitfusion documentation is still publicly available.

Don Sullivan

A11: You will be asked to fill out a POC form and we will be allocated access to the SW.

Q12: Are there other requirements for running #Bitfusion software?

Jim Brogan

A12: Currently it requires a Linux OS (Ubuntu, RHEL or CentOS).

Q13: Isn’t network latency a big problem for remote access to GPUs?

Jim Brogan

A13: Naturally, but Bitfusion’s primary work is to solve the concerns with latency.

A13 Part 2: When Bitfusion intercepts CUDA calls, it takes the opportunity to optimize operations, so as to hide latency. These include re-ordering, pipelining and batching. A lot of Bitfusion IP is actually in the optimization domain.

Bonus Questions Not Featured  

Q14: Is Bitfusion exclusively an NVIDIA GPU solution?

Jim Brogan

A14: The technology applies to any acceleration hardware accessed by an API. Bitfusion today has created implementations for the CUDA API and OpenCL. But CUDA is more-or-less the only player in the marketplace right now.

Q15: NVIDIA offers vGPU (GRID), how does Bitfusion work with or compete against it?

Jim Brogan

A15: Bitfusion can work on top of GRID, but is made to address userspace applications (e.g., using CUDA), not desktop graphics. It runs in userspace, not the hypervisor, so it can allocate and partition GPUs dynamically.

Thank you joining our vSphere + Bitfusion Tweet Chat, featuring our (awesome) vSphere experts. A huge shout out to our experts, Jim Brogan (@brogan_record) and Don Sullivan (@dfsulliv), and the other participants who joined us today.

Stay tuned for our monthly expert chats and join the conversation by using the #vSphereChat hashtag. In the meantime, you can check out our vSphere Tweet Chat hub blog to access a recap of all our previous chats. Have a specific topic that you’d like to cover? Reach out and we’ll bring the topic to our experts for consideration. For now, we’ll see you in the Twittersphere!

The post Explore vSphere + Bitfusion – The Future of AI & ML appeared first on VMware vSphere Blog.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.