Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Database Conversion (MS access to Sqlite3)

Used mdbtools (linux) to convert MS access (mdb) to Sqlite3

Shell script

Code Block
languagebash
titlemdb2sqlite.sh
#!/bin/bash
# Inspired by 
# https://www.codeenigma.com/community/blog/using-mdbtools-nix-convert-microsoft-access-mysql

# USAGE
# Rename your MDB file to migration-export.mdb 
# run ./mdb2sqlite.sh migration-export.mdb
# wait and wait a bit longer...

now=$(date +%s)
sqlite=sqlite3
fname=$1
sql=${fname/mdb/sqlite}
schema=${fname/mdb/schema}
dir=${fname/.mdb/}-$now

mkdir $dir

mdb-schema $fname sqlite > $dir/$schema

for i in $( mdb-tables $fname ); do 
  echo $i  
  mdb-export -D "%Y-%m-%d %H:%M:%S" -H -I sqlite $fname $i > $dir/$i.sql
done

< $dir/$schema $sqlite $sql

for f in $dir/*.sql ; do 
  echo $f 
  (echo 'BEGIN;'; cat $f; echo 'COMMIT;') | $sqlite $sql
done
echo "Using $dir"



Overview website: http://ilrdss.isws.illinois.edu/fox/

To acquire the CSV file from MS access we first converted the .mdb database into SQLite. Once we had access to the .sqlite3 file we created a python script to access it, execute SQL statements, and generate a CSV file from the resulting table. 

View file
nameNMC-Fox-River-Data.pptx
height250

    Possible
  • SQL statement:
Code Block
languagesql
SELECT TBLSampletblsample.Stationstation_IDid, TBLStation_Information.Latitude, TBLStation_Information.Longitude, TBLSample.Start_Date, TBLResults.Result_Value, TBLParameter_Group.Parameter_Group, TBLParameter_Codes.Full_Name, TBLParameter_Codes.Short_Name

FROM TBLStation_Information RIGHT JOIN (((((TBLResults INNER JOIN TBLSample ON (TBLResults.Sample_Code = TBLSample.Sample_Code) AND (TBLResults.IDLoc = TBLSample.IDLoc)) INNER JOIN TBLParameter_Codes ON TBLResults.Parameter_Code = TBLParameter_Codes.Parameter_Code) INNER JOIN TBLQAPPGroups ON TBLParameter_Codes.Parameter_Code = TBLQAPPGroups.Parameter_Code) INNER JOIN TBLQAPP_Group_Codes ON TBLQAPPGroups.QAPPCode = TBLQAPP_Group_Codes.QAPPCode) INNER JOIN TBLParameter_Group ON TBLQAPP_Group_Codes.Parameter_Group = TBLParameter_Group.Parameter_Group) ON TBLStation_Information.Station_ID = TBLSample.Station_ID

WHERE (((TBLParameter_Group.Parameter_Group)=9 Or (TBLParameter_Group.Parameter_Group)=10));

 
       tblstation_information.latitude, 
       tblstation_information.longitude, 
       tblstation_information.place_name_description, 
       tblsample.sample_code, 
       tblsample.idloc, 
       tblparameter_codes.full_name, 
       tblparameter_codes.short_name, 
       tblsample.start_date, 
       tblresults.result_value, 
       tblparameter_group.parameter_group 
FROM   (((( (tblresults 
             INNER JOIN tblsample 
                     ON ( tblresults.sample_code = tblsample.sample_code ) 
                        AND ( tblresults.idloc = tblsample.idloc )) 
            INNER JOIN tblparameter_codes 
                    ON tblresults.parameter_code = 
                       tblparameter_codes.parameter_code) 
          INNER JOIN tblqappgroups 
                  ON tblparameter_codes.parameter_code = 
                     tblqappgroups.parameter_code) 
         INNER JOIN tblqapp_group_codes 
                 ON tblqappgroups.qappcode = tblqapp_group_codes.qappcode) 
        INNER JOIN tblparameter_group 
                ON tblqapp_group_codes.parameter_group = 
                   tblparameter_group.parameter_group) 
       LEFT JOIN tblstation_information 
              ON tblstation_information.station_id = tblsample.station_id 
WHERE  (( ( tblparameter_group.parameter_group ) = 9 
           OR ( tblparameter_group.parameter_group ) = 10 ))


Code Block
languagepy
collapsetrue
""" Access a sqlite3 file,
    executes a SQL statement, and
    generates a CSV file for parsing

Variables
----------
data_loc : str
    The location of the database (for example, ./foxdb/foxdb_20171116.sqlite3)
query : str
    The SQL statement

"""

import sqlite3
import pandas as pd 

data_loc = ""
query = ""

conn = sqlite3.connect(data_loc)

df = pd.read_sql_query(query, conn)
df['Start_Date'] = pd.to_datetime(df['Start_Date'], errors='coerce') #use this to format date and time to only date
df.to_csv("output.csv", encoding='utf-8', index=False, date_format='%Y-%m-%d')

conn.close()

Resources:

  • Presentation at Nutrient Monitoring Council that explains what kinds of query Jong Lee did to extract the data from the database.

View file
nameNMC-Fox-River-Data.pptx
height250