Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

To acquire the CSV file from MS access we first converted the .mdb database into SQLite. Once we had access to the .sqlite3 file we created a python script to access it and execute SQL statements. Our python script generates a CSV file from the result table. 

  • Data location: gltg-source-data/Fox River Watershed Database
    • CSV for latest sqlite3 file: selected_final_foxdb_20171116.csv
    • Lastest MS access db: FoxDB_20171116.mdb
    • Converted sqlite3 file: foxdb_20171116.sqlite3 (please use this one)CSV for latest sqlite3 file: selected_final_foxdb_20171116.csv

  • Database design document:
  • Presentation at Nutrient Monitoring Council that explains what kinds of query Jong Lee did to extract the data from the database.

...

Code Block
languagesql
SELECT TBLSampletblsample.Stationstation_IDid, TBLStation_Information.Latitude, TBLStation_Information.Longitude, TBLStation_Information.Place_Name_Description , TBLSample.Sample_Code, TBLSample.IDLoc, TBLParameter_Codes.Full_Name, TBLParameter_Codes.Short_Name, TBLSample.Start_Date, TBLResults.Result_Value, TBLParameter_Group.Parameter_Group

FROM (((( (TBLResults INNER JOIN TBLSample ON (TBLResults.Sample_Code = TBLSample.Sample_Code) AND (TBLResults.IDLoc = TBLSample.IDLoc)) INNER JOIN TBLParameter_Codes ON TBLResults.Parameter_Code = TBLParameter_Codes.Parameter_Code) INNER JOIN TBLQAPPGroups ON TBLParameter_Codes.Parameter_Code = TBLQAPPGroups.Parameter_Code) INNER JOIN TBLQAPP_Group_Codes ON TBLQAPPGroups.QAPPCode = TBLQAPP_Group_Codes.QAPPCode) INNER JOIN TBLParameter_Group ON TBLQAPP_Group_Codes.Parameter_Group = TBLParameter_Group.Parameter_Group) Left JOIN TBLStation_Information ON TBLStation_Information.Station_ID = TBLSample.Station_ID

WHERE (((TBLParameter_Group.Parameter_Group)=9 Or (TBLParameter_Group.Parameter_Group)=10)); 
       tblstation_information.latitude, 
       tblstation_information.longitude, 
       tblstation_information.place_name_description, 
       tblsample.sample_code, 
       tblsample.idloc, 
       tblparameter_codes.full_name, 
       tblparameter_codes.short_name, 
       tblsample.start_date, 
       tblresults.result_value, 
       tblparameter_group.parameter_group 
FROM   (((( (tblresults 
             INNER JOIN tblsample 
                     ON ( tblresults.sample_code = tblsample.sample_code ) 
                        AND ( tblresults.idloc = tblsample.idloc )) 
            INNER JOIN tblparameter_codes 
                    ON tblresults.parameter_code = 
                       tblparameter_codes.parameter_code) 
          INNER JOIN tblqappgroups 
                  ON tblparameter_codes.parameter_code = 
                     tblqappgroups.parameter_code) 
         INNER JOIN tblqapp_group_codes 
                 ON tblqappgroups.qappcode = tblqapp_group_codes.qappcode) 
        INNER JOIN tblparameter_group 
                ON tblqapp_group_codes.parameter_group = 
                   tblparameter_group.parameter_group) 
       LEFT JOIN tblstation_information 
              ON tblstation_information.station_id = tblsample.station_id 
WHERE  (( ( tblparameter_group.parameter_group ) = 9 
           OR ( tblparameter_group.parameter_group ) = 10 ))


  • Python3 code:
Code Block
languagepy
""" Access a sqlite3 file,
    executes a SQL statement, and
    generates a CSV file for parsing

Variables
----------
data_loc : str
    The location of the database (for example, ./foxdb/foxdb_20171116.sqlite3)
query : str
    The SQL statement

"""
import sqlite3
import pandas as pd 

data_loc = ""
query = ""

conn = sqlite3.connect(data_loc)
df = pd.read_sql_query(query, conn)
df['Start_Date'] = pd.to_datetime(df['Start_Date'], errors='coerce') #use this to format date and time to only date
df.to_csv("output.csv", encoding='utf-8', index=False, date_format='%Y-%m-%d')

conn.close()

...