Category Archives: Data Analysis

Multivariate Analysis of Limestone Petrography Data On Kalipucang Formation Using R (iGeos 2017)

Multivariate Analysis of Limestone Petrography Data On Kalipucang Formation Using R

Achmad Darul1,#., Dasapta Erwin Irawan2., Jejen Ramdani1., Fauzan Septiana1., Siti Saniyyah Sholihat3
1 Fakultas Teknik dan Desain, Institut Teknologi Sains Bandung, Jalan Ganesha Boulevard LOT A-1 CBD Deltamas, Bekasi, Indonesia
2 Fakultas Ilmu dan Teknologi Kebumian, Institut Teknologi Bandung, Jalan Ganesha no. 10 Bandung, Indonesia
3 Fakultas Ilmu Pendidikan, Unversitas Pendidikan Indonesia, Jalan Dr. Setiabudhi no. 229, Bandung, Indonesia

Abstract. Limestone is one of the most strategic construction materials. Its physical properties are controlled by chemical properties. Different types of limestone can be distinguished by examining thin section. However, it is still complicated to classify limestones based on qualitative observation. Geologists also need a method to classify large number of samples based on training data. This paper applies multivariate statistical techniques (principal component analysis and cluster analysis) to assist sample classification. We used 57 samples of thin section rock of Kalipucang Formation from three location: Pancatengah-Tasikmalaya (PCT); Cijulang-Ciamis (CJL) and Sindangsari-Ciamis (SDS). An open source R statistical package, was used in the analysis. The result from our training data, shows a consistent classification with the initial visual classification. Each locations show a distinct petrographical compositions: Group 1 shows the dominant control of depositional environment with strong values of foraminifera, algae, mud carbonate, coral fragments. Group 2 shows a mixing with igneous rock with plagioclase, opaque, glass, pyroxene. Group 3 shows a the a mixing with transported-sediment with traces of quartz compositions, iron oxides, rock fragments. However we need to make more trials using more data set to test this method.

Key words: Limestones, Multivariate analysis, Petrography.

Note: This abstract has already presented in the iGeos International Conference 2017. Currently we’re writing some revisions on the language section. We also need to link our Github page for the data and R code. The complete first version can be accessed at the INA-rxiv preprint server.


Some visualisations of Bandung water quality data

Here I learn some more type visualizations to understand groundwater behavior based on groundwater quality data set. I have 142 data points of water quality data measured in 2015. The dataset can be downloaded from our OSF repository. Currently we are on our way in writing a paper out of the data set based on multivariate analysis. I use free apps to produce all plots. I will add the plots as I move along in the analysis.

Continue reading Some visualisations of Bandung water quality data

Mining PLOS and PubMed data


This post was inspired from Jon Tennant’s post on his blog (here). He was talking about the number of papers on paleontology field published in PLOSone. His post was mainly based on his code using `rplos` package (Github repo/CRAN repo) from `ropensci` community. Jon’s post was kind of fire up my R life again, especially in the field of text mining. So in this post, I will connect this original post with my research about analyzing biodiversity of Cikapundung riverbank area (on Figshare).

Continue reading Mining PLOS and PubMed data

Identifying hidden pattern in the hot water dataset using R

The following abstract is part of our full report on our small research funded by ITB Research Grant 2016. The code and dataset are available here and the Markdown Source of the complete report can be find here.

Continue reading Identifying hidden pattern in the hot water dataset using R

Problem input csv data dalam R

Assalamu’alaikum wrwb. Selamat pagi R user.
Berbagai pengalaman saya saat menyimpan data dengan R. Memang belajar itu adalah pekerjaan setiap hari dan tidak kenal usia. Walaupun sudah menggunakan R sejak 2013 — “menggunakan” ya bukan “memprogram” karena saya memang bukan programmer 🙂 — tapi masih saja banyak hal yang saya pelajari. Bahkan untuk hal-hal mendasar.
(gambar dipinjam dari sini)
Salah satunya adalah saat saya akan menginput data dalam format csv awal minggu ini. Data seperti biasa saya input dan rapihkan menggunakan LibreOffice. Reformatting yang saya lakukan seperti biasa adalah:
  • cek judul kolom dan baris, apakah ada yang di-merge;
  • penamaan ulang judul kolom dan baris agar lebih ringkas tapi tidak kehilangan info utama;
  • cek jenis data: apakah numerik atau bukan dst.

Kemudian tabel saya save as ke dalam for csv. Sebelum akhirnya saya baca dalam R.

Baca data seperti biasa perintahnya:


Kemudian cek jenis data dengan perintah:


Yang aneh data yang saya harapkan numerik jenisnya, terbaca sebagai factor. Saya cek balik ke tabel, Googling bagaimana cara mengkonversi tipe data dst, tidak berhasil!

Kemudian saya tinggal tidur. Ya betul kalau ada yang tidak beres, masalah apapun Sleep on it. 🙂

Keesokan paginya saya coba lagi buka data spreadsheet nya. Kemudian saya iseng ubah format angkanya. Yang sebelumnya format angkanya dengan koma dan titik, saya kembalikan ke default. Artinya angka anda hanya ada tanda desimalnya (bisa koma atau titik), sedangkan pemisah untuk ribuannya hilang.

Saya save as ke csv kembali, dan baca di dalam R.

Dan voila!

Kolom numerik yang sebelumnya terbaca sebagai factor, sekarang terbaca sebagai numerik.

Jadi kalau anda masih menggabungkan pola kerja data input dengan spreadsheet seperti saya, maka hilangkan segala formatting, terutama terhadap angka sebelum meyimpannya ke dalam format ASCII (misal csv atau txt)

Moral of the story: Sleep on it 🙂

Introduction to R for Computational Chemistry Students

This morning I’ll be introducing R for computational chemistry class at Room 9305 Labtek V (hosted by Dr. Rukman Hertadi, ResearchGate profile). Check out this map for the location. See you there.

The following Prezi slide could be used as teaser.


A fresh start

Screen Shot 2016-04-10 at 7.13.42 AM

I wrote this post back on my blog which has been moved to the new blog for almost a year. The reason was, the analytical tools from native WordPress are much more complete than the ones in my new blog. Therefore I need to reorganize both blogs. This old blog will contains materials related to my research interests, while the new blog covers more on open science and research method.

Some future topics on this blog:

  1. shared role on managing groundwater: a serial of posts following our work to assist Bandung City govt in revising the old groundwater regulation.
  2. the use of multivariate analysis to classify water quality: frankly this one was old topics, but the upcoming updates will tell a story based on my papers using R to implement sample classifications. This research is funded by ITB Research Grant 2016.
  3. the open source platform for groundwater management dashboard: this activity is funded by Directorate of Higher Education, Ministry of Research Tech and Higher Education.