Exercise: Gasoline Prices Dataset

Those exercises are based on the "Gasoline Retail Prices Weekly Average by Region: Beginning 2007" dataset from the state of New-York, from 2007 to 2022.

Some of those files have been modified. You will need to find and use the appropriate commands to put them back to their original state. If you don't know how to proceed, a hint is available to guide you.

Each exercise is independant from each other, you can try to resolve each one in any order. Once all exercises are resolved, you can combine all results in one unique file to compare it with the original one.

Initial Setup

Download the zip file containing the data set and uncompress it in a Linux environment.

Exercises

Final Validation

After successfully fixing each of the listed issues, you can validate that all files are correct.

Use the "md5sum" command for each data file and compare with the values listed below:

Note: you must be in the top directory before running all following commands in order to get the proper results.


$ find -name 'data.csv' | sort | xargs md5sum
54052fc2aa4d3d1df3e4b96644f449c2  ./2007/data.csv
d1c548162bdfac82c1dcacaa16d82bce  ./2008/data.csv
7614ad533fb88e408fc416e5295e9f5d  ./2009/data.csv
9019920521b4839eaaaa793dfadd7827  ./2010/data.csv
d8103e3fd04907fdee983a12c8aa321b  ./2011/data.csv
3d9e9333acb02348c6d5d7e7b9b9eb46  ./2012/data.csv
6d8458305642c32b9234a61af3b11507  ./2013/data.csv
72e0b5c52a98be20a54be03a86f91820  ./2014/data.csv
8b9aad2829aac981d91b1afcf81f96aa  ./2015/data.csv
c4cc7eb8ee9a2f6b2813e10b3c40bc18  ./2016/data.csv
11e2233ff4448d4d492d2eb764095b04  ./2017/data.csv
ca7405812691d6a2ad8f9fe5582a4448  ./2018/data.csv
8f8b87eabeb17c4d5ab24b64cc29de5d  ./2019/data.csv
052a77ac52ec7c71054db965cf05e863  ./2020/data.csv
70cfc4ca7815db39fb1fea5f07ee1422  ./2021/data.csv
cd15b2376c2014d9364215dc7b606631  ./2022/data.csv

Then you can rebuild the final file and compare it with the original one:


$ csvfiles=$(find -name data.csv | sort -r)
$ cat $csvfiles > data
$ sed '1!{/^Date/d;}' data > data.csv
$ rm data

$ md5sum data.csv Gasoline_Retail_Prices_Weekly_Average_by_Region__Beginning_2007.csv 
25530293e4ebbb9b988bc2f0f05fc18d  data.csv
25530293e4ebbb9b988bc2f0f05fc18d  Gasoline_Retail_Prices_Weekly_Average_by_Region__Beginning_2007.csv