Working with (mediumly) large #dataset

++ Large dataset++
Be very careful when working with dataset. Check the rows and columns for inconsistencies. I hope the note is readable, since I wrote it on the train.

The skill is not very much taught in the class (Geology Program ITB). I don’t know why, but would it be nice if a undergrad student build integrative database of their field data. Averagely they would go with more than 50 observation points and more than 10 variables. Let’s mention it:

  1. location id
  2. x coordinate
  3. y coordinate
  4. strike
  5. dip
  6. lithology
  7. grain size
  8. fabric
  9. sorting
  10. color
  11. porosity
  12. fresh/weathered
  13. sedimentary structure
  14. upper boundary
  15. lower boundary
  16. etc etc and the list continues

With such database, students can analyse the data more quantitatively, eg: histogram of porosity,  scatter plot between parameters. I don’t about other lecturer, but I think this is doable by geologically-hard headed students :-).

Originally post on my Path


Published by


Research interest: Hydrochemistry, multivariate analysis, and R programming My current focus is how to provide the hydrostratigraphy of volcanic aquifers in Bandung area. The research is based on environmental isotope measurement in groundwater and morphometry. My work consists of hydrochemical measurements. I am using multivariate statistical methods to provides more quantitative foundation for the analysis and more insight into the groundwater behaviour.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s