Originally beginning in the Computer Science Department, the systems
administration team has evolved with the growing needs of the
department, making policy and creating solutions to resolve an
evolving need, as well as instituting policy to avoid potential future downfalls
with regard to use cases that could possibly lead to Honor Code
mishaps.
Initially, the CS department was self-administered, with various
postdocs, graduate students, and undergraduates disparately
maintaining portions of the CS infrastructure. It was eventually
realized that this methodology does not scale within a department, and
stifles productivity in both systems administration and users.
In February of 2000, CS had the following issues:
- nothing is documented
- little or no discipline in setup (wiring, installation, versioning, documentation)
- passwd/hosts/fs/group setup is a non-standard, non-documented, complicated and error prone process
- machines are overloaded with multiple services causing interaction problems
- built on hodge-podge collection of equipment
- servers
- clients
- setup
- hodge-podge location and management of equipment
- (in the past) Too many bosses with own (not coherently interacting) agendas
- no stated policy on level of service target
The goals to solve these issues were presented (
circa FEB 2000):
-
- file servers only serve files.
- replace all existing file servers
- discipline ourselves not to just put disks anywhere having them all be the same and only serve files probably gives us 80% of the benefits here; for the last 20%, we can haggle over the particular type.
- options on table:
- NetApp
- lowest price runs $22k/100GB
- heard nothing but praise for these for uptime and reducing system administration time
- Sun RAID (viable if they would donate it to us)
- Linux PC+RAID
- was pricing $7K/100GB
- If use 50GB disks, could, perhaps, do $10K/200GB
- not locked into vendor, lowest price per GB
- This only works if we can really do it department wide.
-
- we can still run our DNS, allocate IP address in our range
- ITS manages wiring, hubs, etc. (labels them...)
- big benefit is it's simply not our problem
- Service interactions are bad.
- forces compromises in how machines are run and managed
- increase likelihood of failure/downtime
- complicate rebuild/upgrade/migration
- Machines are cheap
- Servers identical generic setup, isolated uniqueness for service they provide.
- Central Services:
- DNS
- YP? / Hesiod? / whatever
- mail (? also need to separate imap and non-imap mailer?)
- web
- backup
- ftp
- license server(s) (maybe more than one)
Straightforward account procedures
- dataset
- host table
- passwd
- fs map
- groups
- printer info (printcap)
- documented addition procedures
- RCS or equivalent version logging
- for users
- for professional or grad. student system hackers
- There are good reasons for wanting particular, perhaps diverging machines.
- At least, need to be identical within class.
- only differ in essentials (name, ip address, ...)
- centrally managed, standard application binary set
- Auto-reconcile with master
- group-specific software, live on group fileservers/systems
A Computer Science Department lunch time discussion took place, to answer lingering questions
and inform faculty, postdocs, grads and staff about the plan to centralize and organize the
hodge-podge that was the CS infrastructure at the time.
You can review the presentation in this
PDF file.
Since that time, the CS Sysadmin Team has innovated services and
infrastructure well beyond its beginnings in 2000, and was the first
to implement at Caltech a long list of technologies, methodologies,
and policies.
First implemented by CS at Caltech was:
- LDAP - Lightweight Directory Access Protocol, used in CS for authentication, authorization, mail aliases. Hardware and software inventory was stored in the LDAP directory for a short time as well.
- SNMP - Simple Network Monitoring Protocol, used in CS to actively and passively monitor resources to be aware of outages before CS users experienced a problem. SNMP in CS evolved into a trending and planning tool, to anticipate future needs for hardware to satisfy growing usage.
- SMTP-Auth - Simple Mail Authentication, used in CS to allow someone off-campus to route outgoing email through CS mail servers, effectively relaying mail but only after the sender has authenticated, to prevent unauthorized use.
- Maildir - an efficient mail storage mechanism in which each email message is a file on disk, implemented in CS to reduce or eliminate file-locking problems involved with traditional 'mbox' storage format in which all messages are stored in a single file.
- access.conf - created as a Pluggable Authentication Module, by a CS postdoctoral scholar before he joined Caltech, to restrict remote access to specific users/groups, implemented in CS to deny mortal user access to infrastructure servers
- Documentation System - initially implemented for both system administrators and research members of CS, Computer Science had the first dynamic documentation system to catalog policies, FAQs, procedures for new users. It has evolved through several iterations over the years, beginning with Zope, then a Content Management System (CMS), then ikiwiki, a wiki-compiler in which documents are stored in a version control system, and finally today FosWiki.
- Canonical configuration management system - originally utilized to handle the automated configuration of a few dozen FreeBSD hosts in the Intel Lab at the time, this has also evolved into a bi-directional automated system in which changes are reported to a central mechanism to confirm configuration changes
Typically, these innovations were accomplished out of pure necessity
to provide a scalable sustainable environment with limited manpower.
Since accomplishing this goal, the systems administration team has
progressed to anticipating user requests within the department in
order to better serve an ever-changing need.
--
AdminUser - 2019-10-07