...
To acquire the CSV file from MS access we first converted the .mdb database into SQLite. Once we had access to the .sqlite3 file we created a python script to access it and execute SQL statements. Our python script generates a CSV file from the result table.
- Data location: gltg-source-data/Fox River Watershed Database
- CSV for latest sqlite3 file: selected_final_foxdb_20171116.csv
- Lastest MS access db: FoxDB_20171116.mdb
- Converted sqlite3 file: foxdb_20171116.sqlite3 (please use this one)CSV for latest sqlite3 file: selected_final_foxdb_20171116.csv
- Database design document:
- All documents: http://ilrdss.sws.uiuc.edu/fox/fox_report_phase1.asp?ws=3
- General description http://ilrdss.sws.uiuc.edu/fox/downloads/Fox_Chapter_4.pdf
- Section 4.4.1 shows a good example of SQL query
- Presentation at Nutrient Monitoring Council that explains what kinds of query Jong Lee did to extract the data from the database.
...
Code Block | ||
---|---|---|
| ||
SELECT TBLSampletblsample.Stationstation_IDid, TBLStation_Information.Latitude, TBLStation_Information.Longitude, TBLStation_Information.Place_Name_Description , TBLSample.Sample_Code, TBLSample.IDLoc, TBLParameter_Codes.Full_Name, TBLParameter_Codes.Short_Name, TBLSample.Start_Date, TBLResults.Result_Value, TBLParameter_Group.Parameter_Group FROM (((( (TBLResults INNER JOIN TBLSample ON (TBLResults.Sample_Code = TBLSample.Sample_Code) AND (TBLResults.IDLoc = TBLSample.IDLoc)) INNER JOIN TBLParameter_Codes ON TBLResults.Parameter_Code = TBLParameter_Codes.Parameter_Code) INNER JOIN TBLQAPPGroups ON TBLParameter_Codes.Parameter_Code = TBLQAPPGroups.Parameter_Code) INNER JOIN TBLQAPP_Group_Codes ON TBLQAPPGroups.QAPPCode = TBLQAPP_Group_Codes.QAPPCode) INNER JOIN TBLParameter_Group ON TBLQAPP_Group_Codes.Parameter_Group = TBLParameter_Group.Parameter_Group) Left JOIN TBLStation_Information ON TBLStation_Information.Station_ID = TBLSample.Station_ID WHERE (((TBLParameter_Group.Parameter_Group)=9 Or (TBLParameter_Group.Parameter_Group)=10)); tblstation_information.latitude, tblstation_information.longitude, tblstation_information.place_name_description, tblsample.sample_code, tblsample.idloc, tblparameter_codes.full_name, tblparameter_codes.short_name, tblsample.start_date, tblresults.result_value, tblparameter_group.parameter_group FROM (((( (tblresults INNER JOIN tblsample ON ( tblresults.sample_code = tblsample.sample_code ) AND ( tblresults.idloc = tblsample.idloc )) INNER JOIN tblparameter_codes ON tblresults.parameter_code = tblparameter_codes.parameter_code) INNER JOIN tblqappgroups ON tblparameter_codes.parameter_code = tblqappgroups.parameter_code) INNER JOIN tblqapp_group_codes ON tblqappgroups.qappcode = tblqapp_group_codes.qappcode) INNER JOIN tblparameter_group ON tblqapp_group_codes.parameter_group = tblparameter_group.parameter_group) LEFT JOIN tblstation_information ON tblstation_information.station_id = tblsample.station_id WHERE (( ( tblparameter_group.parameter_group ) = 9 OR ( tblparameter_group.parameter_group ) = 10 )) |
- Python3 code:
Code Block | ||
---|---|---|
| ||
""" Access a sqlite3 file, executes a SQL statement, and generates a CSV file for parsing Variables ---------- data_loc : str The location of the database (for example, ./foxdb/foxdb_20171116.sqlite3) query : str The SQL statement """ import sqlite3 import pandas as pd data_loc = "" query = "" conn = sqlite3.connect(data_loc) df = pd.read_sql_query(query, conn) df['Start_Date'] = pd.to_datetime(df['Start_Date'], errors='coerce') #use this to format date and time to only date df.to_csv("output.csv", encoding='utf-8', index=False, date_format='%Y-%m-%d') conn.close() |
...