Learning Modules > KDVis DB Tutorial
KDVis DB (i.e. kdvis.mdb) is a small database designed for parsing, cleaning, and analyzing ISI, EndNote, Bib, and other data sources. Based on MS Access, it takes advantages of SQL queries and VBA scripts. Make sure you have MS Access (one component of MS Office, available for IU students at iuware.iu.edu) installed on your system before using this DB application.
The design of the KDVis data structure consulted the 15 basic elements of the Dublin Core Metadata Initiative. I would like to clarify a few of the concepts that would be ambiguous otherwise:
* Resource: a resource refers to an article/paper.
* Creator: a creator refers to an author/scholar.
Currently, KDVis DB is able to import data files from the following formats: ISI, EndNote, and BibTex. Examples are given below.
1) ISI format (isi.txt): A line begins with "PT " denotes the beginning of a new record while a "ER " means that a record ends. Please remove any lines before the very first "PT ***" or after the last "ER".
2) EndNote format (endnote.txt): First export the files from EndNote to a txt file using the "Show All" output style. Our parser reads the resulting .txt file one line at a time, checks which field the line is and stores the data in the corresponding field in the database. Some fields in the input file are concatenated into a single database field. The reference type field always occurs first so this tells us to start a new article record. A blank line tells us we have reached the end of the record.
3) BibTex format (bib.txt): For this file each entry starts with \bibitem and ends with a line ending in a full-stop. The second line begins the author info followed by the date in parentheses. Sometimes author info goes over multiple lines. You should modify the file so the data only go over one line. The following line has the title. The title will sometimes go over multiple lines. The title ends with a comma. So if there is no comma at the end of a line then the title continues on the next line. The remaining lines have additional info.
If MS Access is installed on your system, simple download the kdvis.mdb file and open it. After going through a few security-warning dialogs (this depends on the security level of your system), you will see the following interface:
The main interface has six primary tabs (functions): Load Data, Clean Data, Filter Data (N/A for now), Analyze Data (N/A for now), Browse Data, and Export Data. You can click a heading tab (e.g. Browse Data) to do related operations. Each of the available functions is discussed below.
1) Load Data: With this function, you can import data from ISI, EndNote, and/or BibTex files. To do this:
1. Input the full name of the file you want to import in the "File Name" textbox--we are not using a "Open file" dialog any more because there are security constraints on some MS Access installations.
2. If you want to remove all existing records before importing the new data, check the checkbox;
3. Make sure your file is in one of the formats (ISI, EndNote, or BibTex) and click a related "Import..." button;
4. You might encounter error messages during the importing process. Usually this is because there are duplicate records (e.g. an identical author-name repeats in one paper), which are not allowed in the predefined data structure. In this case, simply click "Yes" to continue until it finishes. You may click "No" at any error if you want to stop and double check your data file.
2) Clean Data: For now, there is one feature that is commonly useful for all the three data formats: "Remove Duplicate Resources" . By clicking the "Remove Duplicate Resources", you are removing duplicate papers/articles in terms of the following columns: title, year (date_created), and venue (publisher).
3) Browse Data: After you successfully import data, you may open related tables to browse the data. (Note: if you are familiar with MS Access, you can open tables prefixed with "tmp_isi_" and/or queries such as "view_resources" through the database window.)
4) Export Data: KDVis provides tools to analyze the data and export the results for further analysis and/or visualization. Currently, you can examine co-author networks in the data and generate .net files for Pajek visualization. To do this:
1. Input the full name (path & file name) of a .net file you want to generate in the "To file" textbox;
2. Specify a threshold for the targeting co-creator (co-author) network. For instance, threshold "2" means that for a co-authorship to be selected the two authors have to have co-authored at least twice.
3. Click the export button .
4. Load the exported file (sample: isi.net) with Pajek.
This application can be used for a variety of data sources as long as they can be mapped to relational components as follow: resources, creators, connections (references), keywords, and categories. Future work will be focused on mapping other data sources to this conceptual model and analyzing citation, co-citation, content similarity (keywords, titles, and abstracts) networks based on this model.