Simplifying Your Software Code Audit

[article]
Summary:
Software code audits can be arduous and time-consuming, as today’s software projects use a mix of proprietary, commercial, and open source software. This article outlines a number of methods to simplify and streamline your audit process and understand the best practices in organizing, documenting, labeling, tracking, and managing open source and third-party content brought into software portfolios.

Sooner or later, every software organization will go through an intellectual property (IP) ownership assessment, traditionally known as a software audit. The assessment could be due to a pending mergers and acquisitions (M&A) event, product delivery to a customer, a tech transfer between organizations, or simply for maintaining internal records in anticipation of a legal or investment event.

The audit process can be arduous, depending on the organization of the code portfolio, quality of the records, and level of documentation. This article highlights a number of steps that, if adopted, can significantly reduce the time and effort involved in a software audit.

Why Perform a Software Code Audit?

Gone are the days when an organization wrote the complete software content of its product. Today’s software projects use a mix of proprietary, commercial, and open source software, created and combined by a team of tens or hundreds of developers working across different locations and time zones. Open source has significantly accelerated the pace of commercial software development, even though many organizations still lack a structured open source software adoption process. Outsourced software and code delivered by third-party contractors add another level of complexity to the code portfolio in a technology organization. In the majority of cases, code composition and code pedigree is generally left undocumented.

Uncertainties around the code composition and IP ownership within an organization increase risk, threaten business operations, deter buyers, increase product time to market, create the possibility of litigation to the company and the downstream clients, reduce the ability to create partnerships, lower valuation of companies, and derail M&A activities.

A software code audit is a discovery process that removes uncertainties around IP ownership. Automated code scanning and identification applications augment expert human analysis, separating a code portfolio into proprietary, third-party, and open source components. The scan then identifies attributes associated with these components and reports on their pedigree. The audit process identifies licensing and other legal obligations, helps open source license management, and reports on known security vulnerabilities associated with the third-party components.

Preparing for an Audit

The more information that is available to the auditors, the easier and more timely the audit can be completed. Therefore, preparation is the key to any audit, and this preparation of information is not limited to software packages. The audit team needs to understand the overall purpose of the audit, as well as the commercial model around the code that is being audited. For example, whether the code is deployed as part of a software as a service commercial model (and not distributed) can cause the auditors to flag licenses that are affected by this business model.

To begin with, the audit team needs access to all the code. Ideally, the code is presented to the auditors after the build process has resolved all dependencies. There also needs to be an understanding of the development environment in which the code was created; that includes all the tools, repositories, and libraries used. If available, providing a best-guess list of the possible third-party components will be very helpful, including any open source or commercial code. This allows the audit team to confirm the “known” components first. A concise list of either internal or external developers who would have worked on the code helps to preaccept author tags and copywriting within the code. In addition, a list of company acquisitions and their copyright formats is also beneficial when preparing for the audit.

Code Organization Matters

Any auditor’s nightmare scenario is a software portfolio consisting of a single folder that contains all source and binary code files, images, workflow information, and licensing details. A flat code organization as described removes the logical audit boundaries between the different pieces of software, the same way a flat bill of materials in a physical product removes the logical product structure information. A systematic top-down structure (also known as software product structure or software manifest) separates the modules that have common compliance characteristics and eases identification of open source software, third-party commercial, and proprietary components. It also helps break a large audit project into many subtasks that involve reasonably-sized blocks, which is often how an audit team will approach a project.

Keep the original third-party folder structures

There is another important reason for maintaining the original third-party code structures. Unless a software file has identifying copyright or licensing information (for example, in the file headers or embedded in the binaries), information within the containing folder, such as licensing information within a LICENCE.TXT file in that folder, applies to the folder content. We will look at this in more detail later in the article.

File-Level Information Matters

Headers in source files are an invaluable source of information regarding the pedigree, ownership, licensing, and purpose of a specific file. To ensure this detail is correct, it is essential to retain the existing headers or clearly identify proprietary files.

Keep original open source software or commercial file headers

With proprietary software, it is essential to use a list of company-approved standard headers that include copyright information (with date), author name, and abstract details. If existing source files are modified, then additions may be added to the header information, but the existing information should never be removed. Use of XML tags to identify different pieces of information in a header allows machine-readability and accurate file-level information extraction. Ensuring copyright and licensing information in human-readable format are embedded in the binary files (such as .exe files) helps with the determination of IP ownership and usage obligations.

Folder-Level Information

One reason for maintaining or enforcing a structured code organization is to group together code files with similar IP ownership or compliance characteristics. Typically, information such as licensing, limitations, quality metrics, and dependencies are included in one form or another within a folder and applied to the content of that folder.

With open source software, maintaining the original package structure is important. Information such as a .pom file, manifest files, readme, COPYING, or LICENSE.TXT all provide valuable insight into the ownership, dependencies, and compliance aspects of that specific version of the open source package. It is common for open source packages to have different licenses, depending on the version or even the specific module of the software component.

Factors Impacting License Obligations

There are several factors that trigger various obligations within a license. The distribution model (whether the software is distributed or used either internally or in the cloud) impacts the way certain licenses, such as different versions of the GNU General Public License (GPL), trigger obligations. Also, with some licenses, any modifications to the third-party software (such as open source software) should be noted, and obligations triggered by the modifications should be honoured. With binary-linked software that is bound by certain licenses (such as GPL varieties), static or dynamic linking to a library matters.

Auditors are generally wary of code dependencies. A piece of seemingly harmless software could, at build time, pull in many other pieces of code that may not necessarily be aligned with the commercial objectives of the organization. Having access to the build information or the post-build processor code portfolio gives a more accurate view into the software. The latter has the added advantage that spurious code files, such as test files, that would not be part of a final product will not be included in the audit.

The Audit Process Is a Continuous Process

A software audit is essentially a discovery process aimed at gaining insight into the software composition of an organization and identifying the attributes of the code components. It helps with open source license management and identifies open source security vulnerabilities. The knowledge gained by a software audit allows foretelling of potential quality, compliance, and security vulnerability issues associated with current and future products. Like other quality processes, the earlier impairments are detected and rectified, the lower the cost of fixing the problem, which decreases the impact of the deficiency.

Automated tools for scanning, inventorying, and managing code portfolios can significantly reduce the compliance effort and control the security vulnerabilities. Open source scanning tools and open source software license management systems have been coming down the cost curve in recent years. A structured open source software adoption process, once affordable only to large organizations, is now within the means of most companies. A policy-based practice, aided by automated tools, can enable organizations to keep track of their open source, commercial, and proprietary software, manage the attributes of the portfolio at different points in a development lifecycle, and integrate compliance within their operations.

Open source software adoption process

A structured software adoption process addresses many aspects, such as defining rules on acceptable software and software integration practices and what to do in case of a deviation from these standards. It can provide a package request and preapproval workflow where development organizations can ask for a piece of software to be used in a certain way within a project. These requests can be opened, addressed by the appropriate person (or people), assessed using automated and semi-automated tools, and then approved or rejected accordingly. Increasingly, real-time code assessment tools (sometimes known as developer assistants) operating on the developer workstations are deployed as the first line of discovery and containment. Communicating the policies and adoption of the practices are essential in a successful open and structured software adoption process.

Conclusion

Any transaction involving software can be delayed or derailed by uncertainties around the IP ownership, obligations, and security vulnerabilities. The discovery process can be arduous, depending on the organization of the code portfolio, quality of the records, and the level of documentation.

Following a series of simple steps, such as separating commercial, open source, and proprietary code, maintaining original software licenses and folder structures, and ensuring every source file has identifying header information can streamline the audit process significantly. Regular audits, aided by automated software scanning and license or security vulnerability management solutions, can significantly reduce the effort and risks in any software-based technology organization.

About the author

CMCrossroads is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.