Initially we had a fairly stringent list of requirements for our
client side database engine. It needed to be smaller, easier to install,
and easier to maintain than oracle, or we might as well just use our oracle
site license. It was also supposed to have row locking so that we could
avoid mimicking flat file locking problems because we simply replaced it
with table locking problems. It had to have a track record of handling
large numbers of entries relative to our needs (figure 100,000 to be safe).
It had to handle rows wide enough to fit our data. It had to have
a perl interface and a c interface. And most importantly, it had to be
both reasonably mature, and actively supported. We did most of our searching
at the following URL:
ftp://ftp.idiom.com/pub/free-databases
After perusing that and asking many RU staff for suggestions, we came down to the following three, but none were a perfect fit.
Sleepycat/Berkely DB
Pro: Small, fast, offers underpinnings for a locking
scheme, supports C and perl, very little to do in the way of administration.
Con: app specific API rather than SQL, only extensible
by programmers with the right libraries etc.., unknown frequency of support
and vague history of real world use, no row locking.
MySQL
Pro: Relatively small, fast, supports large DBs,
supports perl and c, uses SQL, already installed on most of the systems
targeted for rats v2. Frequently updated, rapid bug fixes. Proven history.
Con: No row locking. Frequently updated.
PostgeSQL
Pro: Proven history. Row locks. SQL. Perl and C
support.
Con: large. Difficult to admin (might as well use
oracle).
Given that we couldn't get row locking, stable, and less complicated
than oracle in one package, MySQL was the clear winner for ease of adoption
in production, and the ongoing frequent support. The less stuff completely
new to the administrators the better.
Since the main point of adding the DB is to avoid contention and system degradation due to locks on large flat files that are slow to traverse, we need to make sure we get all the files in the DB we want. We need to get the password file, the shadow file, and the group file into the DB. That covers The standard UNIX files that need to move. However, RATS added another flat file that can potentially cause problems for UID reservation, and that too needs to move into the DB. We will also put in a table representing users and the account types they have in order to allow RATS to do some types of reporting and checking it hasn't been able to do efficiently before. Note that as currently designed, the client database is not designed for historical record keeping any more than the normal unix flat files they wil be replacing are.
DB design
* == primary key member
Note on primary keys. Username is part of the primary key where applicable rather than UID because we want the implimentation to be able to handle multiple clusters in the future. Since there is not a university wide unified UID space that is mandatory, we figured that it would be easier to move this model if the UID was not constrained to teh role of unique identifier within the client database. Username is already globably unique with the RATS software.
SHADOW/PASSWORD TABLE:
This table models the shadow and password flat files
as a single table. This is done to eliminate the possibility of writing
a complete Password entry without writing a complete shadow entry or vise
versa.
The column CLST is there for multi-cluster support in the future. The
active field is there to support UID reservation. The CHG field is there
to support our home brewed "locking" scheme.
USR - VARCHAR(32) *
PASS - VARCHAR(128)
UID - UINT (32-bit)
GID - UINT
GCOS - VARCHAR(128)
HOME - VARCHAR(128)
SHELL - VARCHAR(128)
LAST - DATE
MIN - UINT
MAX - UINT
WARN - UINT
INACT - UINT
EXP - DATE
FLAG - CHAR(1) or UINT or ENUM (haven't decided)
CHG - DATE
CLST - UINT*
ACTIVE - ENUM
COMMENT - VARCHAR(256) ++
GROUP REFERENCE TABLE:
This table lists the data about the groups individually.
It does not contain the list of people in the group. CHG and ACTIVE are
the same as for shadow/password.
GNAME - VARCHAR(32)*
PASS - VCHAR(128)
GID - UINT
CHG - DATE
CLST - UINT*
ACTIVE - ENUM
COMMENT - VARCHAR(256)++
GROUP MEMBER TABLE:
This table lists the members of a group. There is
no active flag because you are either in here for a group or not. Strictly
add/delete. No change.
GID - UINT*
USR - VARCHAR(32)*
CLST - UINT*
COMMETN - VARCHAR(256)++
ACCOUNT TYPES TABLE:
This table lists username account type pairs. It
is another add/delete only table.
TYPE - VARCHAR(64)*
USR - VARCHAR(32)*
CLST - UINT*
COMMENT - VARCHAR(256)++
++ = these comments field may need to be broken out into separate reference table for performance concerns in implimentation so that they can be excluded from frequent operations where their content in not meaningful
QUOTA TABLE:
The table will be generated algorithmically, but
will really only contain data in need of being flushed to the systems'
quotas. RATS adds an entry to the table to be flushed for user X. At periodic
intervals, all the quota daemons will pull entries to be flushed for their
system. This table will consist solely of transient entries. After flushing
the table will be empty (on multi-cluster sytems, there amy always be something
in need of flushing).
UID - UINT*
FILE SYSTEM - VARCHAR(128)*
SOFT - UINT
HARD - UINT
CLST - UINT *
LOCKING SOLUTION
Since we were unable to find an acceptable answer to our row locking
solution, we have devised a way around it. Basically all reads and writes
are atomic. Since the read should only occur in one table at a time, and
MySQL assures us that a read and a write will not occur simultaneously,
the read always reads data as safely as one can really expect given our
operating model (i.e. row locks would give us nothing). For any given API
call, it should never be writing to more than one table. The write should
take one of two forms. It is either an insert ( a new entry), or an update
(a change to an existing entry). Inserts should never cause contention
or race conditions over data because an insert will fail if the unique
data already exists.
This leaves us with one situation to cover, and that is the data change
/ update. Basically, all changes will have a variable attached to them
containing a timestamp. If the operation was read data -> alter data locally
-> write data to DB, then the timestamp is populated by reading the CHNG
date from a given table. When attempting to write the alterations, it will
constrain the update with a "where CHNG = timestamp". It will then error
on write if the CHNG date has been altered since you fetched data, and
return you to the data change screen with some method of comparing your
data vs. what is now in the DB. All writes will set the CHNG date in a
table to the current date/time. I would rather have row locks, but this
is a decent workaround. At this point one should note that RATS is generally
going to try to avoid race conditions and contention over any single row.
There really shouldn't be too many clients grabbing the same row of data
at the same time anyway. It is currently planned to use the built in date
data type with the intended resolution to be one second. If this is insufficient,
we will come up with a different datatype for the timestamp during implimentation.