SOFTWARE/DBMSCOPY

 

DBMS/Comparetm

  • Do you have tens, hundreds, or thousands of databases and you aren't sure what's in them?
  • Have you made copies of tables and asked yourself:
    • Are they exact copies?
    • Have I deleted some fields?
    • Did I add extra calculated fields?
    • Is it a subset?
    • Does it contain more records for the same project?
  • Would you like one spreadsheet, text file or html page showing all variables on all databases?
  • Would you like to build an output file with all the values of all the variables on any number of databases?
  • Have you inherited a long term project and have no idea what is contained in what file?
  • Need to do work on that long ago forgotten project?

DBMS/Compare can help you answer these questions.

As hard drives have increased in size we tend not to delete data files.  Why should we?  They might contain useful information.  What's a little wasted disk space?  In a world of faster computers we are able to play "what-if" by building subsets, computing new fields and testing ideas with the data.  The end result is a disk drive full of data.  Some of it may be duplicates, some may be subsets and some of it the original source data.  DBMS/Compare, an exciting new program from the developers of DBMS/COPY, is here to help you make sense of your data files.

How does it work?

Select the Files

DBMS/Compare can search any directory and subdirectories looking for your files.  Here are the sample steps to follow:

  • Specify the starting directory
  • Decide if you want subdirectories scanned
  • Define the filename filter
  • Pick the file types you want to search or locate

The Database and Variable View

After DBMS/Compare finds all the files, it builds the following window.  The window has two views:  Databases and Variables.

The Database view, shows the following information on each file: the directory, filename, extension, creation date, record count, variable count, and a list of the variables on the file.

The Variable view, shows the variable names, the number of databases containing the variable, and a list of the databases.

In the screen shot below, the Database view shows the variable names in a "compressed" view.  The Variable view shows the database names in an "expanded" view.  The database names are in the column heading and each variable/database intersection has a colored block.  You can switch back and forth between compressed and expanded views.

Database Sorting Options

The database display can be sorted by: directory, filename, extension, creation date, or variable count.  

Here is the list of databases sorted by the record count.

Looking at the display you can quickly see that the 2nd and 3rd databases have the same record counts, same creation date and what is probably the same variable list.  (There is a sort option that will check for variable name matches.)  You can also see that the 4th and 5th databases look like they might be related to the other two databases because the variable names are the same except for the last two digits.  It might be that the data is for 1997 and 1998.)

Variable Sorting Options

The variable display can be sorted by variable name or the number of databases which contain a variable.  

There is one more sort option.  You can highlight one variable name and then sort the others based how many databases each variable has in common with the highlighted one.  The same sort can be done with the Database View -- highlight one database and see what other databases share the same variables.  This is great when you have one database in mind and want to see what other databases share the same variables.

Below is the Variable View with the variables sorted by closeness to the ABSID variable.  As you can see, variable BATCH is on 19 databases and 5 of them also have the ABSID variable.

Variable Name Mapping

Frequently variables that contain the same type of data have different names in different databases.  This might be because of naming conventions across projects or limitations of the package (name length for example).  When you want to compare these variables, it is necessary that they have the same name.  DBMS/Compare gives you the power to map variable names to other names.  Below is the Variable View showing that Address1 is mapped to ADDR1 and Address2 is mapped to ADDR2.  All mapping is accomplished with point and click.  Any number of variables can be mapped to any number of other variables.  Once you are done mapping, DBMS/Compare will rebuild the database and variable views to show the mapping.

Creating Subsets

Once you have identified a set of related databases, DBMS/Compare can create a "subset" view consisting of only those databases and their variables.  Any number of subsets can be created.  Subsets can be further subsetted.

Subsets are created by highlighting either database names or variable names.  If you click on a variable name, all the databases containing that variable will be added to the subset.  The first column in the "database" view indicates that the database will become part of the subset.

Below is a subset view of the databases containing the ABSID variable.  By specifying a title we make it easy to keep our subsets organized.

What's In the Database

So far, the DBMS/Compare tools we have shown organize general database information (name, creation date, number of records, number of variables and variable names on the database).  These tools help you understand the structure of the data and how the data files might relate to one another.

Now, we are going to show you the DBMS/Compare tools that help you see the data inside the file -- variable values and individual records.

Value Lists

Once you have a set of related databases, you might want to know what are the variable's values.  DBMS/Compare can build value lists for all variables on a set of databases.  The following table, shows the values of the ABSID variable across 5 databases.  Each table cell shows the frequency of that value on the database.  Each value is color coded based on the matching of counts.  The color coding aids in finding related databases.  For example, value 1502 is on 5 records for mfraprot and mrafsum, 31 records for mrafdrug and 11 times for mrafsurg.  It would appear that mrafprot and mrafsum are related, mrafprot always has the same or lower value count than mrafsum, may be it is a subset.

We can isolate the two databases in one value view so the comparison is easier.

Record Listings

In addition to value lists, DBMS/Compare can display a record by record listing.  In the screen shot below, you can see the 5 databases for each record down the rows and across the columns are the variables.  Each cell shows the variable's values.  This display color codes the values that match -- variable ABSID has a value of 102 for all but one database.

If you find the color matching confusing, just turn it off.

The record list can also be rotated to put the variables down the rows and the databases across the columns.  It all depends on what you find helpful.

*** Output Options ***

Not only does DBMS/Compare let you interact with the information but it can be saved to data files for further analysis, manipulation and dissemination.  

Database and Variable Output Options

The Database and Variable views can be written to Excel spreadsheets, HTML files or ASCII text files.  The views can be saved in expanded mode, compressed mode or "unraveled" mode.  You've already seen expand and compressed modes previously -- they generate one row per database.  Unraveled mode generates one line per variable (for the database view) or one line per database (for the variable view).  Here is a short segment of the HTML page generated for the databases in unraveled mode.  (not all variables on all databases are shown.)  Each variable gets one row, that row shows the file information, and one variable.

Directory Filename Ext Date Records Variables Sequence Variable
C:\temp a sd2 2000/07/31 193 26 1 _INDEX_
C:\temp a sd2 2000/07/31 193 26 2 _STAT_
C:\temp a sd2 2000/07/31 193 26 3 C_DISPL
C:\temp a sd2 2000/07/31 193 26 4 C_FOREIG
C:\temp a sd2 2000/07/31 193 26 5 C_GRATIO
C:\temp a sd2 2000/07/31 193 26 6 C_HDROOM
C:\temp a sd2 2000/07/31 193 26 7 C_LENGTH
C:\temp a1 sd2 2000/07/26 74 12 1 DISPL
C:\temp a1 sd2 2000/07/26 74 12 2 FOREIGN
C:\temp a1 sd2 2000/07/26 74 12 3 GRATIO
C:\temp a1 sd2 2000/07/26 74 12 4 HDROOM
C:\temp a1 sd2 2000/07/26 74 12 5 LENGTH
C:\temp\vern a5 sav 1999/10/13 74 1 1 MAKE
C:\temp abc sd2 2000/04/18 210 3 1 A
C:\temp abc sd2 2000/04/18 210 3 2 B
C:\temp abc sd2 2000/04/18 210 3 3 C
C:\temp abrdthrs sd2 2000/02/14 241 4 1 CUM_HRS
C:\temp abrdthrs sd2 2000/02/14 241 4 2 DATE
C:\temp abrdthrs sd2 2000/02/14 241 4 3 DIFF_HRS
C:\temp abrdthrs sd2 2000/02/14 241 4 4 RDT_HRS
C:\temp\spss aipq sav 2000/09/19 696 11 1 CIGMOOD
C:\temp\spss aipq sav 2000/09/19 696 11 2 CIGNOW
C:\temp\spss aipq sav 2000/09/19 696 11 3 CONFID
C:\temp\spss aipq sav 2000/09/19 696 11 4 CONTROL

Value List Output

DBMS/Compare can write the value lists to either an Excel spreadsheet, ASCII file or HTML page.  You can select one variable, all variables, or a specified list of variables.  If you write more than one variable to an Excel spreadsheet, each variable will go on a separate page.  Here is an example, as you can see the variable names are the page tabs.

Record Comparison Output

DBMS/Compare can write the record comparison output to either an Excel spreadsheet, ASCII file or HTML page. The output will be just like the current window display.

In the Excel spreadsheet below, is the output of a match variable record comparison listing.  The first column is the overall record number sequence.  This example has the variable names on the row dimension and the databases across the columns.  Since this is a match comparison, the record number from each database is show.

If you write the data to an excel spreadsheet and there are too many records to fit on one page, multiple pages will be generated.  If you fill up an entire workbook,  multiple spreadsheet files will be created.  They will have the same name as the first spreadsheet but with a sequential number appended.  DBMS/Compare can write any amount of data to Excel.

In Summary

DBMS/Compare is an exciting new program to help you make sense of your data files.


 

Copyright © 2000 DBMS/Compare and DBMS/COPY are trademarks of Conceptual Software, Inc.


 
Copyright © 2002 TStat All rights reserved via Baden Powell, 8/I - 67039 - Sulmona (AQ) - Italia