How to manage embedded software development for startups — with Aegis

This page was inspired by the following article:

Hardison, O. (2004), How to manage software development for startups, Embedded Systems Programming, 15-Nov-2004.

Introduction

Even if you're not involved in a startup, these best practices for running a small organization on a tight budget may help you critique your own organization. How to hold meetings, what tools to buy, and what to skip — it's all here straight from one who's learned the hard lessons.

Although software development processes are numerous and well documented, they don't always fit the time and resource constraints of an embedded systems startup company. Often a startup's software team follows a leaner path to achieve their goal: get a product out on time with no major defects. These teams frequently use open source tools and a mix of hybrid software development life cycle techniques, but there are as many nuanced processes as there are embedded systems startups.

What works? This article provides a set of software development processes that work well in the context of an embedded systems startup. These include:

The early stage setup of the software configuration management and build policies;
activities during start of the development stage — from the functional requirements document to first customer ship.

While some of these steps may seem obvious, each step described is vital to achieving the stated goal.

Minimal tools required

Starting with the assumption that startups are notoriously cheap organizations, here is a list of open source tools that startups would do well to have at their disposal:

Software configuration management (SCM) system, such as Aegis;
Scripting languages for builds and reports, such as Perl and Cook;
Bug-tracking or problem-tracking database, such as Bugzilla or GNATS;
The GNU C Compiler suite can be configured as a cross compiler for dozens of CPUs;
There are other useful open source tools for embedded development, such as SRecord.

Your software configuration management (SCM) tools, and bug-tracking database, should preferably have a web based query and reporting interface. It is not necessary to export this interface to the world, having it available on the company's intranet is sufficient.

Configuration management policies

Before starting to develop the product, the engineering organization needs a foundation on which to reliably build and release software. This foundation should include, at a minimum, support for documentation, source code, and release tracking.

Document in SCM and on the web

You should make sure that all documentation is source code controlled and also available via the company's intranet. This step makes it easy to track, change, and find the latest design and module information. The documentation should include some means of displaying source changes and histories so that interested parties (for example, the quality assurance and documentation groups) can review and note code changes. The Aegis web interface (aeget) does just this. It provides a web interface that displays source history and differences between code revisions, and much more besides.

You should also have project plan documents (source controlled in the same repository as the source code) that contain the overall structure of each major module within the product. The project plan should point out the changes to be added to a module in each release. This information reduces architectural and release-oriented questions that might otherwise be foisted on engineers during the cycle, thus freeing them up from low-priority interrupts. One of the major jobs of engineering managers is to protect their engineers from external information and feature requests. By having all pertinent information available online, a manager will reduce some of these requests; others can usually be handled by the manager himself.

Plan the release and the merge

A major release should be represented as a branch off of the current Aegis trunk. Minor and patch releases should be branches off of existing release branches. Spell out your policies regarding ending release and support branches (implying when merges back to the main branch occur). Do this in a source controlled project planning document, ideally available via an intranet web page. Aegis provides automatic ways to determine the extent of merging required between a branch and its parent.

For example, you may use the Linux model, and have odd numbered branches be development branches, and even numbered branches be release branches.

Include a reference to a bug ID in change set descriptions

Every change set has a text description, to convey information concerning the change set's purpose. You should add a line to this description containing the bug ID for the bug tracking system you are using, for every change set that fixes a bug, or is tracked by your issue management system.

Alternatively, each change can be assigned arbitrary text attributes, and a “BugId” attribute is not only searchable, but will automatically appear in change set listings (use a lower case name if you don't want your change set lists polluted).

It is also simple to set up an email notification for every integration, not only to the software development team, but also to the documentation and quality assurance teams.

Once you have two or more developers, it is a good idea to turn on Aegis' code review features, which ensure that every change set has been code reviewed before it is integrated. Aegis supports many tools for code reviews, including tools such as tkdiff for color coded GUI code reviews.

Aegis also provides after-the-fact differences via the command line, or via the web interface (see the Aegis web site for an example of the information available about each change set).

Build policies

With the source controlled, it's time to tame the build. How often will you build? What about identifying each software configuration? What type of notification will take place each time an integration is completed (or fails)?

Aegis attempts to ensure that all change sets integrated into the project “work”. This usually includes some concept of “compiles successfully and passes some tests”.

The project build is configured with an automated build command which is required to report success before your change can be put up for code review and eventual integration. This tends to dramatically reduce the number of commits which result in the entire development team being unable to build the product.

Configuration Identifiers

Every single integration is given a unique identifier by Aegis. By burying this identifier (a short text string) in the software at integration build time, it is possible later to query the software for this this version string recreate the entire state of the project source code at the time of integration.

Because every integration builds the code, there is no need for the nightly automated build, because it has already been done. The results are available for all to see (and use, recalling that it works) in the branch's baseline.

Mandate email notification of integrations

For any integration completed, an email notification should be sent to the entire software group. It is simple to configure Aegis to do this if you want plain text; a little more scripting is necessary for HTML email with links to various intranet web sources of information.

The email for each change set should include a listing of all files integrated into the branch. In this way, engineers know what parts of the system have changed or what bugs were potentially fixed. To increase the report's usefulness to QA especially, the generating script should include the configuration identifier.

Automated Testing

You may think that automated testing for embedded projects takes far too many resources, and far too much engineering effort, to be worthwhile for a startup.

The build process typically produces a cross compiled image of the software to be downloaded into the target system. A very useful approach is to have the build process also compile a native executable for the build host.

Such an executable, while not cycle-identical to the target (in fact, it will frequently be considerably faster, and a different CPU) can be automatically tested via scripts on the build host, for very little additional engineering effort.

For example: by using conditional C macros for all I/O accesses, the target code accesses memory directly, but the build host version hands the access off to a device simulator, effectively allowing you to exercise the logic all the way to the bottom of your device drivers. When you run the executable, the console can be attached to the debug port simulator, giving an authentic (and scriptable) interface to the software being constructed.

Unit tests are equally useful on the build host, where they only contain an isolated portion of the complete code, and not sufficient operating system or run time environment to be viable on the target system.

Aegis provides facilities for supporting optional or mandatory tests accompanying each change set. By using automated tests in this fashion, many bugs are caught and squished long before the code is downloaded into the target system for real world testing (often, for small startup, tediously manual).

These tests accumulate in the project, and there is a simple Aegis command to run the accumulated set of tests in a developer's private work area before putting a change set up for review and eventual integration. Projects rapidly accumulate such tests, and it can become time consuming to run them all, so a subset can be designated as a “smoke test” and a shell script (source controlled, of course) made available to developers to run them.

QA and RC branches

Because the development branch “works” to the extent that you have configured Aegis to look for, many QA activities can be validly performed out of the development branch's baseline. This can be sufficient for a small start-up.

For added insulation against a rapidly changing development branch (or one that project management has decided to “break” for a short time), a QA branch can be opened, with regular updates from the development branch, either via Aegis branch inheritance mechanism, or more manually via Aegis' various branch and change set cloning and reproduction tools.

Like any other branch, the usual build-test-review-integrate cycle is performed, giving opportunities for email notification, but also automatic on-demand builds of the product.

Release and support branches are most easily handled by ending the development branch (a branch is just a super change set in Aegis) and creating two sister branches, and even numbered branch for release and support, and an odd numbered branch for ongoing development and QA.

Engineering and QA policies

Developers are buzzing away on features and fixes, you've got the automated build and tests happening, so now it's time to engage the QA people. Some organizations start QA validation fairly early in the cycle; others wait for a preordained code freeze milestone before commencing testing. Your mileage may vary, and you might be short changed if QA is working on other releases concurrently. Once you do decide to engage QA, here are some tips on what to expect.

QA and later-cycle RC builds

Engineers use the integration builds when testing embedded systems on their target processor and the build host. The code is first tested on the build host using an automated test script written by the developer; possibly through a unit test harness. Once this indicates the the code probably works, it can be downloaded onto the target and debugged from that point.

Change sets are promoted to the QA branch on a regular basis so that QA may perform their test suites and validate any fixed bugs. This can be automated using Aegis by having the QA branch be a child branch of the development branch. By building and integrating an empty change set, the QA branch's baseline inherits all of the change sets applied to the parent (development) branch since the last QA integration build. As a rule of thumb, a QA import is appropriate once a week so that QA has enough time to run their test suites and the queue of fixed bugs doesn't grow too large. The frequency of QA imports depends on the size of the project.

Ideally, QA will write automated test scripts so that bugs can be reproduced, and the test scripts are imported into the development branch to be run by developers to make sure that once a bug is fixed, it stays fixed. Maybe it is important enough to add to the smoke test script.

RC branches should be separate from development branches; ideally you can end the development branch, and the new RC branch will inherit all changes. Incomplete change sets that will not make it into this release can be cloned onto the new development branch just before the old development branch ends.

It is a good idea to archive RC build products (such as flash images and user documentation sets) so that a precise and exact copy of the user package is readily available. (Often necessitated by statistical FPGA compilers, which never produce the same answer even when given the same inputs). They're extremely useful for testing older releases and release candidates for the existence of some classes of difficult bugs.

Finally, both QA and RC builds should have their deltas named in Aegis so they can be reproduced by name, not just by the ID buried in the executable. Having delta names also helps generate the various reports so that engineering team and QA can see what fixes made it into each baseline.

Turning knobs — time vs. resources

Which knob are you going to turn? Do you have the luxury of time in your schedule to take your project to zero defects, or will you be forced to settle for zero fatal and major defects? Can you move resources either within QA or into QA to help the time crunch? The more challenging part of a release is the trade off when confronted with a time-constrained release or resource-constrained release. While there are no short answers to problems such as these, once you know where you are and where you should be, you can gauge what knobs to turn. The key is understanding your current position.

Hold regular meetings

Hold regular meetings to consider the state of the release. You should require two types of meetings: the daily bug meeting and the weekly product review meeting.

Daily bug meeting

The first of meeting goes under different names at different companies but for our purposes I'll call it a daily bug meeting. Whatever you call it, the meeting's goal is defined as taking a regular and frequent (daily is the norm) snapshot of the current bug state of a release. Early in the cycle numerous bugs may be on the list so you should concentrate on a those with a high priority.

The first part of the daily bug meeting should be devoted to the bug review. The meeting coordinator (either the project manager or the engineering manager in charge) runs a projector that displays a summary and individual bug reports. The latter part of the meeting should look at the bug queues (new, open, fixed) and outstanding issues to determine how the release is progressing along. The benefits of a regular review are many:

All of the responsible parties are up-to-date on the state of the release — thus it makes sense to include a representative from each of the stake holders for the release — engineering, QA, product marketing, program management, manufacturing, and documentation if possible;
Bug priorities can be assigned — later in the cycle this becomes a critical determinate of whether a bug will be fixed before shipping;
Late feature requests can be judged for merit.

Note that some projects are so small that this meeting can quite profitably be replaced with the project manager looking at new bugs since yesterday and assigning them priorities and dispatching them to the responsible developer by email. Even in large organizations this can be very effective. The point is to deal with bugs rapidly, rather than let them accumulate to just before a release when you have no resources to spare to fix them. If you have to have a daily meeting, consider doing it in a room without chairs, it is amazing how short meetings can be when held this way.

Weekly product review meeting

The second required meeting is a product review meeting, during which all of the stake holders are required to meet and review the progress of the product's release as a whole. This weekly meeting should be called by the marketing product manager and should be attended by the same people as the daily bug meeting plus some senior marketing and manufacturing heads. The purpose is to ensure the release stays on schedule and that the disparate groups coordinate their efforts (is the web site up, is the marketing collateral in shape, will the product launch at any trade shows?).

In small startups, this meeting can be very small and frequently has only the team leader or project manager representing the developers (and he's up-to-date because he reads the integration email notifications) and QA and marketing.

Toward the end game

So you have these wonderful policies implemented and your process is humming, new source developed, and at some point the QA builds start. What now? There should be some time in the schedule for a code freeze. That point may be determined by some heuristic like a graph showing open bugs vs. closed over time, or it may be the hard-and-fast requirement of a predetermined schedule. It all depends on what knobs you're able to turn. At code freeze, the bar is set higher — software engineers are advised to check-in only the bugs at a threshold priority. QA must turn around and verify bug fixes with a high degree of reliability for both the fixes themselves and for the time that it takes to test them.

Open vs. fixed graph

A graph that displays open vs. fixed issues can provide insight on stability of the release (number of open issues) and QA loading (number of fixed bugs waiting close). For example, consider the graph in Figure 3; the open issues are being handled even though there are still five waiting for a fix. The to-be-fixed queue is rising, indicating that QA is lagging closing out bugs. A decision maker might look at this and decide to move some QA resources to help close out this release.

Aegis can provide histograms of change sets making their way through the process, or you can mine Aegis' meta-data and make your own. Note, however, that this is implementation focused.

Another source of data, and probably more issue focused, is the bug state information from your bug tracking tools, such as Bugzilla. Statistics like these can also be graphed automatically on a web page through some CGI scripting.

Beta

To get to this stage and have a product ready for customer trial, only minor issues should be outstanding (or at least one or two major issues with workarounds). The open vs. fixed graph should be trending downward, and the frequency of bug meetings should be reduced to once or twice a week, and the majority of the discussion should be on coordination issues with the documentation group on how to characterize any lingering issues. Most of the engineering group should have already started on the next release with some possibly working on a follow-up maintenance or patch release and others on the next major release.

First customer ship

Ah, you are done. Or are you? Congratulate yourself and your team and then get ready to start pushing the rock uphill once more. Often the release team should host a post-release meeting to go over what did and did not succeed during the cycle. This is a great time to fine tune your release process.

Along the way you might consider some additional questions. Have you started on a maintenance release? Have you checked with your sales or support engineer (quite a valuable and underutilized resource for internal status checks) on how the release is being accepted? Has anyone considered a customer survey to see if the release produced as it promised or if it missed the mark?

Too Much Process

It is possible that the process describe here is much to heavy. Maybe your startup has only a couple of developers and a one-man CEO slash marketing slash sales group.

The software development process described here (automated builds, full code reviews, host based automated testing) works, even when (especially when?) there is only a small number of developers, and it scales to accommodate your growth. Aegis does most of the work, releasing developers from paper-work hell. The automation of builds and the validations provided by tests and code reviews give you needed stability so that your team can focus on actually making your product.

The beautiful graphics on this web site are by Grégory Delattre.

Return to the Aegis home page.