Overview website: http://ilrdss.isws.illinois.edu/fox/
To acquire the CSV file from MS access we first converted the .mdb database into SQLite. Once we had access to the .sqlite3 file we created a python script to access it and execute SQL statements.
- Data location: gltg-source-data/Fox River Watershed Database
- Lastest MS access db: FoxDB_20171116.mdb
- Converted sqlite3 file: foxdb_20171116.sqlite3 (please use this one)
- CSV for latest sqlite3 file: selected_final_foxdb_20171116.csv
- Database design document:
- Presentation at Nutrient Monitoring Council that explains what kinds of query Jong Lee did to extract the data from the database.
View file |
---|
name | NMC-Fox-River-Data.pptx |
---|
height | 250 |
---|
|
- Possible SQL statements :
Code Block |
---|
|
SELECT TBLSample.Station_ID, TBLStation_Information.Latitude, TBLStation_Information.Longitude, TBLSample.Start_Date, TBLResults.Result_Value, TBLParameter_Group.Parameter_Group, TBLParameter_Codes.Full_Name, TBLParameter_Codes.Short_Name
FROM TBLStation_Information RIGHT JOIN (((((TBLResults INNER JOIN TBLSample ON (TBLResults.Sample_Code = TBLSample.Sample_Code) AND (TBLResults.IDLoc = TBLSample.IDLoc)) INNER JOIN TBLParameter_Codes ON TBLResults.Parameter_Code = TBLParameter_Codes.Parameter_Code) INNER JOIN TBLQAPPGroups ON TBLParameter_Codes.Parameter_Code = TBLQAPPGroups.Parameter_Code) INNER JOIN TBLQAPP_Group_Codes ON TBLQAPPGroups.QAPPCode = TBLQAPP_Group_Codes.QAPPCode) INNER JOIN TBLParameter_Group ON TBLQAPP_Group_Codes.Parameter_Group = TBLParameter_Group.Parameter_Group) ON TBLStation_Information.Station_ID = TBLSample.Station_ID
WHERE (((TBLParameter_Group.Parameter_Group)=9 Or (TBLParameter_Group.Parameter_Group)=10));
|
- SQL statement used to extract from sqlite3 file:*SQL statement:*
Code Block |
---|
|
SELECT TBLSample.Station_ID, TBLStation_Information.Latitude, TBLStation_Information.Longitude, TBLStation_Information.Place_Name_Description , TBLSample.Sample_Code, TBLSample.IDLoc, TBLParameter_Codes.Full_Name, TBLParameter_Codes.Short_Name, TBLSample.Start_Date, TBLResults.Result_Value, TBLParameter_Group.Parameter_Group
FROM (((( (TBLResults INNER JOIN TBLSample ON (TBLResults.Sample_Code = TBLSample.Sample_Code) AND (TBLResults.IDLoc = TBLSample.IDLoc)) INNER JOIN TBLParameter_Codes ON TBLResults.Parameter_Code = TBLParameter_Codes.Parameter_Code) INNER JOIN TBLQAPPGroups ON TBLParameter_Codes.Parameter_Code = TBLQAPPGroups.Parameter_Code) INNER JOIN TBLQAPP_Group_Codes ON TBLQAPPGroups.QAPPCode = TBLQAPP_Group_Codes.QAPPCode) INNER JOIN TBLParameter_Group ON TBLQAPP_Group_Codes.Parameter_Group = TBLParameter_Group.Parameter_Group) Left JOIN TBLStation_Information ON TBLStation_Information.Station_ID = TBLSample.Station_ID
WHERE (((TBLParameter_Group.Parameter_Group)=9 Or (TBLParameter_Group.Parameter_Group)=10)); |
...
Code Block |
---|
|
""" Access a sqlite3 file,
executes a SQL statement, and
generates a CSV file for parsing
Variables
----------
data_loc : str
The location of the database (for example, ./foxdb/foxdb_20171116.sqlite3)
query : str
The SQL statement
"""
import sqlite3
import pandas as pd
data_loc = ""
query = ""
conn = sqlite3.connect(data_loc)
df = pd.read_sql_query(query, conn)
df['Start_Date'] = pd.to_datetime(df['Start_Date'], errors='coerce') #use this to format date and time to only date
df.to_csv("output.csv", encoding='utf-8', index=False, date_format='%Y-%m-%d')
conn.close()
|
Resources:
- Python sqlite3 module
- Python Data Analysis Library