DBMS/Comparetm
- Do you have tens, hundreds, or thousands of databases and you aren't sure
what's in them?
- Have you made copies of tables and asked yourself:
- Are they exact copies?
- Have I deleted some fields?
- Did I add extra calculated fields?
- Is it a subset?
- Does it contain more records for the same project?
- Would you like one spreadsheet, text file or html page showing all variables
on all databases?
- Would you like to build an output file with all the values of all the variables
on any number of databases?
- Have you inherited a long term project and have no idea what is contained
in what file?
- Need to do work on that long ago forgotten project?
DBMS/Compare can help you answer these questions.
As hard drives have increased in size we tend not to delete
data files. Why should we? They might contain useful
information. What's a little wasted disk space?
In a world of faster computers we are able to play "what-if"
by building subsets, computing new fields and testing ideas
with the data. The end result is a disk drive full of
data. Some of it may be duplicates, some may be subsets
and some of it the original source data. DBMS/Compare,
an exciting new program from the developers of DBMS/COPY, is
here to help you make sense of your data files.
How does it work?
Select the Files
DBMS/Compare can search any directory and subdirectories looking
for your files. Here are the sample steps to follow:
- Specify the starting directory
- Decide if you want subdirectories scanned
- Define the filename filter
- Pick the file types you want to search or locate

The Database and Variable View
After DBMS/Compare finds all the files, it builds the following
window. The window has two views: Databases and
Variables.
The Database view, shows the following information on
each file: the directory, filename, extension, creation date,
record count, variable count, and a list of the variables on
the file.
The Variable view, shows the variable names, the number
of databases containing the variable, and a list of the databases.
In the screen shot below, the Database view shows the variable
names in a "compressed" view. The Variable view
shows the database names in an "expanded" view.
The database names are in the column heading and each variable/database
intersection has a colored block. You can switch back
and forth between compressed and expanded views.

Database Sorting Options
The database display can be sorted by: directory, filename, extension, creation
date, or variable count.
Here is the list of databases sorted by the record count.

Looking at the display you can quickly see that the 2nd and
3rd databases have the same record counts, same creation date
and what is probably the same variable list. (There is
a sort option that will check for variable name matches.)
You can also see that the 4th and 5th databases look like they
might be related to the other two databases because the variable
names are the same except for the last two digits. It
might be that the data is for 1997 and 1998.)
Variable Sorting Options
The variable display can be sorted by variable name or the
number of databases which contain a variable.
There is one more sort option. You can highlight one
variable name and then sort the others based how many databases
each variable has in common with the highlighted one.
The same sort can be done with the Database View -- highlight
one database and see what other databases share the same variables.
This is great when you have one database in mind and want to
see what other databases share the same variables.
Below is the Variable View with the variables sorted by closeness
to the ABSID variable. As you can see, variable BATCH
is on 19 databases and 5 of them also have the ABSID variable.

Variable Name Mapping
Frequently variables that contain the same type of data have
different names in different databases. This might be
because of naming conventions across projects or limitations
of the package (name length for example). When you want
to compare these variables, it is necessary that they have the
same name. DBMS/Compare gives you the power to map variable
names to other names. Below is the Variable View showing
that Address1 is mapped to ADDR1 and Address2
is mapped to ADDR2. All mapping is accomplished
with point and click. Any number of variables can be mapped
to any number of other variables. Once you are done mapping,
DBMS/Compare will rebuild the database and variable views to
show the mapping.

Creating Subsets
Once you have identified a set of related databases, DBMS/Compare
can create a "subset" view consisting of only those
databases and their variables. Any number of subsets can
be created. Subsets can be further subsetted.
Subsets are created by highlighting either database names or
variable names. If you click on a variable name, all the
databases containing that variable will be added to the subset.
The first column in the "database" view indicates
that the database will become part of the subset.
Below is a subset view of the databases containing the ABSID
variable. By specifying a title we make it easy to keep
our subsets organized.

What's In the Database
So far, the DBMS/Compare tools we have shown organize general
database information (name, creation date, number of records,
number of variables and variable names on the database).
These tools help you understand the structure of the data and
how the data files might relate to one another.
Now, we are going to show you the DBMS/Compare tools that help
you see the data inside the file -- variable values and individual
records.
Value Lists
Once you have a set of related databases, you might want to
know what are the variable's values. DBMS/Compare can
build value lists for all variables on a set of databases.
The following table, shows the values of the ABSID variable
across 5 databases. Each table cell shows the frequency
of that value on the database. Each value is color coded
based on the matching of counts. The color coding aids
in finding related databases. For example, value 1502
is on 5 records for mfraprot and mrafsum, 31 records
for mrafdrug and 11 times for mrafsurg.
It would appear that mrafprot and mrafsum are related, mrafprot
always has the same or lower value count than mrafsum, may be
it is a subset.

We can isolate the two databases in one value view so the comparison is easier.

Record Listings
In addition to value lists, DBMS/Compare can display a record
by record listing. In the screen shot below, you can see
the 5 databases for each record down the rows and across the
columns are the variables. Each cell shows the variable's
values. This display color codes the values that match
-- variable ABSID has a value of 102 for all but one database.

If you find the color matching confusing, just turn it off.

The record list can also be rotated to put the variables down
the rows and the databases across the columns. It all
depends on what you find helpful.

*** Output Options ***
Not only does DBMS/Compare let you interact with the information
but it can be saved to data files for further analysis, manipulation
and dissemination.
Database and Variable Output Options
The Database and Variable views can be written to Excel spreadsheets,
HTML files or ASCII text files. The views can be saved
in expanded mode, compressed mode or "unraveled" mode.
You've already seen expand and compressed modes previously --
they generate one row per database. Unraveled mode generates
one line per variable (for the database view) or one line per
database (for the variable view). Here is a short segment
of the HTML page generated for the databases in unraveled mode.
(not all variables on all databases are shown.) Each variable
gets one row, that row shows the file information, and one variable.
| Directory |
Filename |
Ext |
Date |
Records |
Variables |
Sequence |
Variable |
| C:\temp |
a |
sd2 |
2000/07/31 |
193 |
26 |
1 |
_INDEX_ |
| C:\temp |
a |
sd2 |
2000/07/31 |
193 |
26 |
2 |
_STAT_ |
| C:\temp |
a |
sd2 |
2000/07/31 |
193 |
26 |
3 |
C_DISPL |
| C:\temp |
a |
sd2 |
2000/07/31 |
193 |
26 |
4 |
C_FOREIG |
| C:\temp |
a |
sd2 |
2000/07/31 |
193 |
26 |
5 |
C_GRATIO |
| C:\temp |
a |
sd2 |
2000/07/31 |
193 |
26 |
6 |
C_HDROOM |
| C:\temp |
a |
sd2 |
2000/07/31 |
193 |
26 |
7 |
C_LENGTH |
| C:\temp |
a1 |
sd2 |
2000/07/26 |
74 |
12 |
1 |
DISPL |
| C:\temp |
a1 |
sd2 |
2000/07/26 |
74 |
12 |
2 |
FOREIGN |
| C:\temp |
a1 |
sd2 |
2000/07/26 |
74 |
12 |
3 |
GRATIO |
| C:\temp |
a1 |
sd2 |
2000/07/26 |
74 |
12 |
4 |
HDROOM |
| C:\temp |
a1 |
sd2 |
2000/07/26 |
74 |
12 |
5 |
LENGTH |
| C:\temp\vern |
a5 |
sav |
1999/10/13 |
74 |
1 |
1 |
MAKE |
| C:\temp |
abc |
sd2 |
2000/04/18 |
210 |
3 |
1 |
A |
| C:\temp |
abc |
sd2 |
2000/04/18 |
210 |
3 |
2 |
B |
| C:\temp |
abc |
sd2 |
2000/04/18 |
210 |
3 |
3 |
C |
| C:\temp |
abrdthrs |
sd2 |
2000/02/14 |
241 |
4 |
1 |
CUM_HRS |
| C:\temp |
abrdthrs |
sd2 |
2000/02/14 |
241 |
4 |
2 |
DATE |
| C:\temp |
abrdthrs |
sd2 |
2000/02/14 |
241 |
4 |
3 |
DIFF_HRS |
| C:\temp |
abrdthrs |
sd2 |
2000/02/14 |
241 |
4 |
4 |
RDT_HRS |
| C:\temp\spss |
aipq |
sav |
2000/09/19 |
696 |
11 |
1 |
CIGMOOD |
| C:\temp\spss |
aipq |
sav |
2000/09/19 |
696 |
11 |
2 |
CIGNOW |
| C:\temp\spss |
aipq |
sav |
2000/09/19 |
696 |
11 |
3 |
CONFID |
| C:\temp\spss |
aipq |
sav |
2000/09/19 |
696 |
11 |
4 |
CONTROL |
Value List Output
DBMS/Compare can write the value lists to either an Excel spreadsheet,
ASCII file or HTML page. You can select one variable,
all variables, or a specified list of variables. If you
write more than one variable to an Excel spreadsheet, each variable
will go on a separate page. Here is an example, as you
can see the variable names are the page tabs.

Record Comparison Output
DBMS/Compare can write the record comparison output to either
an Excel spreadsheet, ASCII file or HTML page. The output
will be just like the current window display.
In the Excel spreadsheet below, is the output of a match variable
record comparison listing. The first column is the overall
record number sequence. This example has the variable names
on the row dimension and the databases across the columns.
Since this is a match comparison, the record number from each
database is show.
If you write the data to an excel spreadsheet and there are
too many records to fit on one page, multiple pages will be
generated. If you fill up an entire workbook, multiple spreadsheet
files will be created. They will have the same name as the
first spreadsheet but with a sequential number appended. DBMS/Compare
can write any amount of data to Excel.

In Summary
DBMS/Compare is an exciting new program to help you make sense of your data
files.
Copyright © 2000 DBMS/Compare and DBMS/COPY are trademarks of Conceptual Software,
Inc.


|