OpenStack Nova Architecture

NOTE: I’ve updated and expanded this blog post for the Folsom release. Click here to read the updated version.

One of the common refrains I hear from people getting started with OpenStack is the lack of good introductory architectural overviews of the project. I was confronted by the same problem when I first started with the project - it was easy to get the low level code and API documentation but it was very difficult to find a “lay of the land”-type overview. Now that Cactus (OpenStack’s third version) has been released, I thought I’d take advantage of the lull in development to write up a quick architectural overview from my point of view. Since OpenStack is a fairly broad topic, I’ll break my thoughts into several posts. Today’s post will deal with OpenStack Nova’s (compute cloud) high level architecture.

Before we dive into the conceptual and logic architecture, let’s take a second to explain the OpenStack project:

OpenStack is a collection of open source technologies delivering a massively scalable cloud operating system.

You can think of it as software to power your own Infrastructure as a Service (IaaS) offering like Amazon Web Services. It currently encompasses three main projects:

  • Swift which provides object/blob storage. This is roughly analogous to Rackspace Cloud Files (from which it is derived) or Amazon S3.
  • Glance which provides discovery, storage and retrieval of virtual machine images for OpenStack Nova.
  • Nova which provides virtual servers upon demand. This is similar to Rackspaces Cloud Servers or Amazon EC2.

While these three projects provide the core of the cloud infrastructure, OpenStack is open and evolving — there will be more projects (there are already related projects for web interfaces and a queue service). With that brief introduction, let’s delve into a conceptual architecture and then examine how OpenStack Nova could map to it.

Cloud Provider Conceptual Architecture

Imagine that we are going to build our own IaaS cloud and offer it to customers. To achieve this, we would need to provide several high level features:

  1. Allow application owners to register for our cloud services, view their usage and see their bill (basic customer relations management functionality)
  2. Allow Developers/DevOps folks to create and store custom images for their applications (basic build-time functionality)
  3. Allow DevOps/Developers to launch, monitor and terminate instances (basic run-time functionality)
  4. Allow the Cloud Operator to configure and operate the cloud infrastructure

While there are certainly many, many other features that we would need to offer (especially if we were to follow are more complete industry framework like eTOM), these four get to the very heart of providing IaaS. Now assuming that you agree with these four top level features, you might put together a conceptual architecture that looks something like this:

IaaS Conceptual Architecture

In this model, I’ve imagined four sets of users (developers, devops, owners and operators) that need to interact with the cloud and then separated out the functionality needed for each. From there, I’ve followed a pretty common tiered approach to the architecture (presentation, logic and resources) with two orthogonal areas (integration and management). Let’s explore each a little further:

  • As with presentation layers in more typical application architectures, components here interact with users to accept and present information. In this layer, you will find web portals to provide graphical interfaces for non-developers and API endpoints for developers. For more advanced architectures, you might find load balancing, console proxies, security and naming services present here also.

  • The logic tier would provide the intelligence and control functionality for our cloud. This tier would house orchestration (workflow for complex tasks), scheduling (determining mapping of jobs to resources), policy (quotas and such) , image registry (metadata about instance images), logging (events and metering).

  • There will need to integration functions within the architecture. It is assumed that most service providers will already have a customer identity and billing systems. Any cloud architecture would need to integrate with these systems.

  • As with any complex environment, we will need a management tier to operate the environment. This should include an API to access the cloud administration features as well as some forms of monitoring. It is likely that the monitoring functionality will take the form of integration into an existing tool. While I’ve highlighted monitoring and an admin API for our fictional provider, in a more complete architecture you would see a vast array of operational support functions like provisioning and configuration management.

  • Finally, since this is a compute cloud, we will need actual compute, network and storage resources to provide to our customers. This tier provides these services, whether they be servers, network switches, network attached storage or other resources.

With this model in place, let’s shift gears and look at OpenStack Nova’s logical architecture.

OpenStack Nova Logical Architecture

Now that we’ve looked at a proposed conceptual architecture, let’s see how OpenStack Nova is logically architected. Since Cactus is the newest release, I will concentrate there (which means if you are viewing this after around July 2011, this will be out of date). There are several logical components of OpenStack Nova architecture but the majority of these components are custom written python daemons of two varieties:

  • WSGI applications to receive and mediate API calls (nova-api, glance-api, etc.)
  • Worker daemons to carry out orchestration tasks (nova-compute, nova-network, nova-schedule, etc.)

However, there are two essential pieces of the logical architecture are neither custom written nor Python based: the messaging queue and the database. These two components facilitate the asynchronous orchestration of complex tasks through message passing and information sharing. Putting this all together we get a picture like this:

OpenStack Nova Logical Architecture

This complicated, but not overly informative, diagram as it can be summed up in three sentences:

  • End users (DevOps, Developers and even other OpenStack components) talk to nova-api to interface with OpenStack Nova
  • OpenStack Nova daemons exchange info through the queue (actions) and database (information) to carry out API requests
  • OpenStack Glance is basically a completely separate infrastructure which OpenStack Nova interfaces through the Glance API

Now that we see the overview of the processes and their interactions, let’s take a closer look at each component.

  • The nova-api daemon is the heart of the OpenStack Nova. You may see it illustrated on many pictures of OpenStack Nova as API and “Cloud Controller”. While this is partly true, cloud controller is really just a class (specifically the CloudController in trunk/nova/api/ec2/cloud.py) within the nova-api daemon. It provides an endpoint for all API queries (either OpenStack API or EC2 API), initiates most of the orchestration activities (such as running an instance) and also enforces some policy (mostly quota checks).

  • The nova-schedule process is conceptually the simplest piece of code in OpenStack Nova: take a virtual machine instance request from the queue and determines where it should run (specifically, which compute server host it should run on). In practice however, I am sure this will grow to be the most complex as it needs to factor in current state of the entire cloud infrastructure and apply complicated algorithm to ensure efficient usage. To that end, nova-schedule implements a pluggable architecture that let’s you choose (or write) your own algorithm for scheduling. Currently, there are several to choose from (simple, chance, etc) and it is a area of hot development for the future releases of OpenStack Nova.

  • The nova-compute process is primarily a worker daemon that creates and terminates virtual machine instances. The process by which it does so is fairly complex (see this blog post by Laurence Luce for the gritty details) but the basics are simple: accept actions from the queue and then perform a series of system commands (like launching a KVM instance) to carry them out while updating state in the database.

  • As you can gather by the name, nova-volume manages the creation, attaching and detaching of persistent volumes to compute instances (similar functionality to Amazon’s Elastic Block Storage). It can use volumes from a variety of providers such as iSCSI or AoE.

  • The nova-network worker daemon is very similar to nova-compute and nova-volume. It accepts networking tasks from the queue and then performs tasks to manipulate the network (such as setting up bridging interfaces or changing iptables rules).

  • The queue provides a central hub for passing messages between daemons. This is currently implemented with RabbitMQ today, but theoretically could be any AMPQ message queue supported by the python ampqlib.

  • The SQL database stores most of the build-time and run-time state for a cloud infrastructure. This includes the instance types that are available for use, instances in use, networks available and projects. Theoretically, OpenStack Nova can support any database supported by SQL-Alchemy but the only databases currently being widely used are sqlite3 (only appropriate for test and development work), MySQL and PostgreSQL.

  • OpenStack Glance is a separate project from OpenStack Nova, but as shown above, complimentary. While it is an optional part of the overall compute architecture, I can’t imagine that most OpenStack Nova installation will not be using it (or a complimentary product). There are three pieces to Glance: glance-api, glance-registry and the image store. As you can probably guess, glance-api accepts API calls, much like nova-api, and the actual image blobs are placed in the image store. The glance-registry stores and retrieves metadata about images. The image store can be a number of different object stores, including OpenStack Swift.

  • Finally, another optional project that we will need for our fictional service provider is an user dashboard. I have picked the OpenStack Dashboard here, but there are also several other web front ends available for OpenStack Nova. The OpenStack Dashboard provides a web interface into OpenStack Nova to give application developers and devops staff similar functionality to the API. It is currently implemented as a Django web application.

This logical architecture represents just one way to architect OpenStack Nova. With it’s pluggable architecture, we could easily swap out OpenStack Glance with another image service or use another dashboard. In the coming releases of OpenStack, expect to see more modularization of the code especially in the network and volume areas.

Nova Conceptual Mapping

Now that we’ve seen a conceptual architecture for a fictional cloud provider and examined the logical architecture of OpenStack Nova, it is fairly easy to map the OpenStack components to the conceptual areas to see what we are lacking:

OpenStack Nova conceptual coverage

As you can see from the illustration, I’ve overlaid logical components of OpenStack Nova, Glance and Dashboard to denote functional coverage. For each of the overlays, I’ve added the name of the logical component within the project that provides the functionality. While all of these judgements are highly subjective, you can see that we have a majority coverage of the functional areas with a few notable exceptions:

  • The largest gap in our functional coverage is logging and billing. At the moment, OpenStack Nova doesn’t have a billing component that can mediate logging events, rate the logs and create/present bills. That being said, most service providers will already have one (or many) of these so the focus is really on the logging and integration with billing. This could be remedied in a variety of ways: augmentations of the code (which should happen in the next release “Diablo”), integration with commercial products or services (perhaps Zuora) or custom log parsing.

  • Identity is also a point which will likely need to be augmented. Unless we are running a stock LDAP for our identity system, we will need to integrate our solution with OpenStack Nova. Having said that, this is true of almost all cloud solutions.

  • The customer portal will also be an integration point. While OpenStack Nova provides a user dashboard (to see running instance, launch new instances, etc.), it doesn’t provide an interface to allow application owners to signup for service, track their bills and lodge trouble tickets. Again, this is probably something that it is already in place at our imaginary service provider.
  • Ideally, the Admin API would replicate all functionality that we’d be able to do via the command line interface (which in this case is mostly the exposed through the nova-manage command). This will get better in the Diablo release with the Admin API work.

  • Cloud monitoring and operations will be an important area of focus for our service provider. A key to any good operations approach is good tooling. While OpenStack Nova provides nova-instancemonitor, which tracks compute node utilization, we’re really going to need a number of third party tools for monitoring.

  • Policy is an extremely important area but very provider specific. Everything from quotas (which are supported) to quality of service (QoS) to privacy controls can fall under this. I’ve given OpenStack Nova partial coverage here, but that might vary depending on the intricacies of the providers needs. For the record, OpenStack Nova Cactus provides quotas for instances (number and cores used), volumes (size and number), floating IP addresses and metadata.

  • Scheduling within OpenStack Nova is fairly rudimentary for larger installations today. The pluggable scheduler supports chance (random host assignment), simple (least loaded) and zone (random nodes within an availability zone). As within most areas on this list, this will be greatly augmented in Diablo. In development are distributed schedulers and schedulers that understand heterogeneous hosts (for support of GPUs and differing CPU architectures).

As you can see, OpenStack Nova provides a fair basis for our mythical service provider, as long as we are willing to do some integration here and there. In my next post, I’ll dive deeper into OpenStack Nova with a discussion on deployment architecture choices.


See also