DB engine Selection


Initially we had a fairly stringent list of requirements for our client side database engine. It needed to be smaller, easier to install, and easier to maintain than oracle, or we might as well just use our oracle site license. It was also supposed to have row locking so that we could avoid mimicking flat file locking problems because we simply replaced it with table locking problems. It had to have a track record of handling large numbers of entries relative to our needs (figure 100,000 to be safe). It had to handle rows wide enough to fit our data.  It had to have a perl interface and a c interface. And most importantly, it had to be both reasonably mature, and actively supported. We did most of our searching at the following URL:

ftp://ftp.idiom.com/pub/free-databases

After perusing that and asking many RU staff for suggestions, we came down to the following three, but none were a perfect fit.

Sleepycat/Berkely DB
    Pro: Small, fast, offers underpinnings for a locking scheme, supports C and perl, very little to do in the way of administration.
    Con: app specific API rather than SQL, only extensible by programmers with the right libraries etc.., unknown frequency of support and vague history of real world use, no row locking.

MySQL
    Pro: Relatively small, fast, supports large DBs, supports perl and c, uses SQL, already installed on most of the systems targeted for rats v2. Frequently updated, rapid bug fixes. Proven history.
    Con: No row locking. Frequently updated.

PostgeSQL
    Pro: Proven history. Row locks. SQL. Perl and C support.
    Con: large. Difficult to admin (might as well use oracle).

Given that we couldn't get row locking, stable, and less complicated than oracle in one package, MySQL was the clear winner for ease of adoption in production, and the ongoing frequent support. The less stuff completely new to the administrators the better.
 
 

DATABASE DESIGN

DB Contents

Since the main point of adding the DB is to avoid contention and system degradation due to locks on large flat files that are slow to traverse, we need to make sure we get all the files in the DB we want. We need to get the password file, the shadow file, and the group file into the DB. That covers The standard UNIX files that need to move. However, RATS added another flat file that can potentially cause problems for UID reservation, and that too needs to move into the DB. We will also put in a table representing users and the account types they have in order to allow RATS to do some types of reporting and checking it hasn't been able to do efficiently before. Note that as currently designed, the client database is not designed for historical record keeping any more than the normal unix flat files they wil be replacing are.

DB design

* == primary key member

Note on primary keys. Username is part of the primary key where applicable rather than UID because we want the implimentation to be able to handle multiple clusters in the future. Since there is not a university wide unified UID space that is mandatory, we figured that it would be easier to move this model if the UID was not constrained to teh role of unique identifier within the client database. Username is already globably unique with the RATS software.

SHADOW/PASSWORD TABLE:
    This table models the shadow and password flat files as a single table. This is done to eliminate the possibility of writing a complete Password entry without writing a complete shadow entry or vise versa.
The column CLST is there for multi-cluster support in the future. The active field is there to support UID reservation. The CHG field is there to support our home brewed "locking" scheme.

USR - VARCHAR(32) *
PASS - VARCHAR(128)
UID - UINT (32-bit)
GID - UINT
GCOS - VARCHAR(128)
HOME - VARCHAR(128)
SHELL - VARCHAR(128)
LAST - DATE
MIN - UINT
MAX - UINT
WARN - UINT
INACT - UINT
EXP - DATE
FLAG - CHAR(1) or UINT or ENUM (haven't decided)
CHG - DATE
CLST - UINT*
ACTIVE - ENUM
COMMENT - VARCHAR(256) ++

GROUP REFERENCE TABLE:
    This table lists the data about the groups individually. It does not contain the list of people in the group. CHG and ACTIVE are the same as for shadow/password.

GNAME - VARCHAR(32)*
PASS - VCHAR(128)
GID - UINT
CHG - DATE
CLST - UINT*
ACTIVE - ENUM
COMMENT - VARCHAR(256)++

GROUP MEMBER TABLE:
    This table lists the members of a group. There is no active flag because you are either in here for a group or not. Strictly add/delete. No change.

GID - UINT*
USR - VARCHAR(32)*
CLST - UINT*
COMMETN - VARCHAR(256)++

ACCOUNT TYPES TABLE:
    This table lists username account type pairs. It is another add/delete only table.
TYPE - VARCHAR(64)*
USR - VARCHAR(32)*
CLST - UINT*
COMMENT - VARCHAR(256)++

++ = these comments field may need to be broken out into separate reference table for performance concerns in implimentation so that they can be excluded from frequent operations where their content in not meaningful

QUOTA TABLE:
    The table will be generated algorithmically, but will really only contain data in need of being flushed to the systems' quotas. RATS adds an entry to the table to be flushed for user X. At periodic intervals, all the quota daemons will pull entries to be flushed for their system. This table will consist solely of transient entries. After flushing the table will be empty (on multi-cluster sytems, there amy always be something in need of flushing).

UID - UINT*
FILE SYSTEM - VARCHAR(128)*
SOFT - UINT
HARD - UINT
CLST - UINT *
 

LOCKING SOLUTION

Since we were unable to find an acceptable answer to our row locking solution, we have devised a way around it. Basically all reads and writes are atomic. Since the read should only occur in one table at a time, and MySQL assures us that a read and a write will not occur simultaneously, the read always reads data as safely as one can really expect given our operating model (i.e. row locks would give us nothing). For any given API call, it should never be writing to more than one table. The write should take one of two forms. It is either an insert ( a new entry), or an update (a change to an existing entry). Inserts should never cause contention or race conditions over data because an insert will fail if the unique data already exists.
This leaves us with one situation to cover, and that is the data change / update. Basically, all changes will have a variable attached to them containing a timestamp. If the operation was read data -> alter data locally -> write data to DB, then the timestamp is populated by reading the CHNG date from a given table. When attempting to write the alterations, it will constrain the update with a "where CHNG = timestamp". It will then error on write if the CHNG date has been altered since you fetched data, and return you to the data change screen with some method of comparing your data vs. what is now in the DB. All writes will set the CHNG date in a table to the current date/time. I would rather have row locks, but this is a decent workaround. At this point one should note that RATS is generally going to try to avoid race conditions and contention over any single row. There really shouldn't be too many clients grabbing the same row of data at the same time anyway. It is currently planned to use the built in date data type with the intended resolution to be one second. If this is insufficient, we will come up with a different datatype for the timestamp during implimentation.
 

DB Impact

 The client side DB will have significant impact on systems. Practically speaking, the client side DB is now going to be the repository of valid user data on the system rather than the standard UNIX flat files. Several problems fall out of this, and not all of them are entirely obvious. The first problem that needs to be addressed is the population of the DB. Of course the RATS V2 system will feed new data into the system as accounts are created. However, since we want to install this on existing systems without deleting the userbase first, we need some way to populate the DB from the flat files that already exist on the system. We also will receive new data on the fly from system administrator changes. All the current tools work on the flat files, but we can't afford to continue that way on large systems at Rutgers for the purpose of account creation. So RATS has to speak DB. The UNIX administration tools all speak flat file. You can't leave it that way because we will never know which differences are correct, and which are bogus. This means the administrative tools will have to speak DB. If the administrative tools have to speak DB so do the user tools that alter these files. Another place systems are getting hammered is with the current quota system. Although it doesn't really work off of a flat file, we can probably make it work more reliably by managing access to it through the client side DB with regards to account creation. The fact that all your critical data is now in a DB also means you have to make sure that data is being backed up correctly. In general moving to a DB like this is a serious impact to the system, as it means everyone is pretty much going to have to learn some new tools and behaviors. It also means that you are entrusting system integrity to something new, and that if something does go wrong, it is going to require new skills to fix it. On the up side, by abstracting this system data into a generic and flexible model like a DB now means that transitions to things like LDAP may be easier, and we might even be able to improve current things like yp.