ConfigurationManagement < SysAdmin

You are here: CMS Wiki>SysAdmin Web>Methodology>ConfigurationManagement (2025-04-22, DavidLeBlanc)Edit Attach

Historically, Caltech Computer Science Department was among the first to implement configuration management as part of its system administration philosophy.

A methodology was sought to overcome the need to manage a growing number of computers, given a set of configuration items to be implemented on similar machines. After reviewing the available tools at the time, their advantages and disadvantages and the needs of the environment, cfengine became the tool of choice. Cfengine was a success due to its ability to be customized in both depth and breadth yet remain flexible enough to handle special cases where a configuration item needed to be excluded or appended to a baseline of an arbitrary group of hosts. Initially, cfengine was deployed as part of a post-kickstart process that would configure an installed machine prior to first reboot and thereafter to maintain configuration state, running as a standalone configuration management system on each host.

Even though cfengine had the ability to act in a server-client mode of operation, due to resource and timing issues it was concluded that standalone methodology was the best way to implement cfengine. This decision effectively decentralized the configuration management such that each host made its own configuration changes based on a massive list of global configuration items. Furthermore, it requires that each system needs a copy of the entire body of configuration parameters for all systems under cfengine control; since there is no configuration oracle to consult, each system needs access to all configuration information in order to make decisions and perform actions upon the system's own configuration state. This implementation does not scale very well, and introduces security concerns while adding complexity when new systems are introduced to the decentralized configuration base.

With a configuration management tool such as cfengine, growing complexity within a group of hosts becomes as difficult to manage as it would be without the tool itself. For example, a web service requires configuration items to be installed and maintained by the system, however multiple web services introduces complexity in the configuration management of different applications and implementations in intrinsically similar services, thus causing each host needing to differ slightly from the configuration base and requiring slightly different configuration items specific to the service/application.

Modification operations in cfengine are quite similar to manual system administrator operations such as which specific web server should receive a configuration file, or which arbitrary group of machines should be excluded from a particular set of baseline configuration items. While cfengine will effectively handle operations such as service restarts contingent upon a configuration item change, the logic of this function is left to the operator. Success of the custom logic depends critically upon the system administrator programming the correct sequence of events to be carried out within a cfengine configuration.

Cfengine has never practically and efficiently reported failures in its operation when run in "standalone" mode, though it was never thoroughly tested in server-client mode it seems reporting flaws would still be present. In configuration management it is critical that operations are carried out and that any failures become known to the administrator, else the state of the host is incorrect and unknown to those whom may correct the flaws in configuration. Similarly, it is important to know that all configuration changes are completed timely and accurately such that troubleshooting problems that occur can proceed with confidence. Failures in configuration changes should coincide with monitoring activities of host resources. While cfengine does report inconsistencies in its own logs, in a decentralized standalone implementation there is a major lack of coordination in failures whether they are caused by misconfigurations or troubles with the configuration management itself. With growing complexity in a configuration management framework, it is imperative that administrators are aware of the state of the systems both as a whole group and individually.

Due to methods in cfengine, the way cfengine was implemented, the changing needs of the systems under our control and the subtle changes between versions of a Linux distribution, cfengineis largely unable to handle the complexity introduced over the evolution of a group of computers. Thus, cfengine has outgrown its usefulness as a configuration management tool.

In CS, we are beginning to adopt a new utility for automation in system configuration, due to cfengine's shortcomings.

To overcome these shortfalls in current configuration management, different tools were reviewed and/or tested. These include:

PIKT - while primarily intended for systems monitoring, has secondary functionality for configuration management. PIKT uses a somewhat arcane configuration specification framework, and didn't seem to fit the needs addressed above.
LCFG - though it is designed for large scale configuration management and able to scale across an ever changing environment, seems to lack any reporting abilities as to whether configuration changes are being made
puppet - since all the addressed functionality seems to be addressed by this tool, it would seem to be a very good solution, however the design philosophy appears to be too whimsical in its nature and may be harmful when deployed on a large scale, causing extra work later on
bcfg2 - addresses all the shortcomings of cfengine, and introduces new functionality that allows systems administrators to coordinate and share configuration schema; this is the chosen tool for CS. Bcfg2 helps system administrators produce a consistent, reproducible, and verifiable description of their environment, and offers visualization and reporting tools to aid in day-to-day administrative tasks. It is the fifth generation of configuration management tools developed in the Mathematics and Computer Science Division of Argonne National Laboratory.

Bcfg2 is based on an operational model in which the specification can be used to validate and optionally change the state of clients, but in a feature unique to bcfg2 the client's response to the specification can also be used to assess the completeness of the specification. Using this feature, bcfg2 provides an objective measure of how good a job an administrator has done in specifying the configuration of client systems. Bcfg2 is therefore built to help administrators construct an accurate, comprehensive specification. Bcfg2 has been designed from the ground up to support gentle reconciliation between the specification and current client states. It is designed to gracefully cope with manual system modifications.

Finally, due to the rapid pace of updates on modern networks, client systems are constantly changing; Bcfg2 can enable the construction of complex change management and deployment strategies. }

Since then, we have moved to SaltStack. Salt documentation forthcoming HERE.

After a vulnerability was found in SaltStack, and then exploited against a few machines in CMS, we have entirely disabled and uninstalled SaltStack.

We now use Ansible. Ansible operates on simple SSH keys, but extends its ability much further than any systems mentioned above, in my opinion.

-- DavidLeBlanc

Topic revision: r2 - 2025-04-22, DavidLeBlanc

SysAdmin

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding CMS Wiki? Send feedback