WTF: APAKAH CSL ITU?

What is csl?

Table of Contents

Naskah ini merupakan draft awal, bagian dari buku “Menulis (ilmiah) itu menyenangkan”. Artikel pendek ini diilhami diskusi yang kami ikuti beberapa hari lalu mengenai citation style language dalam format penulisan tugas akhir. Seperti biasa, dokumen ini ditulis dalam text mode menggunakan Emacs org-mode, tanpa Ms Word. Semoga bermanfaat.

1 Gaya sitasi

Anda pasti pernah melihat Daftar Pustaka. Isinya adalah identitas lengkap rujukan yang telah anda gunakan dalam teks. Mungkin isinya akan seperti ini:
`

Irawan, DE., Silaen, H., Sumintadireja, P., Lubis, RF., 
Brahmantyo, B., and Puradimaja, DJ. (2014). 
Groundwater-surface water interactions of Ciliwung River streams, 
segment Bogor-Jakarta, Indonesia, Environmental Earth Sciences, 
73(7). doi:10.1016/j.obhdp.2007.08.002

Kalau anda bandingkan antara dua jurnal yang berbeda, seringkali gaya penulisan pustakanya berbeda. Ini karena di dunia setidaknya dikenal ada empat gaya sitasi:

  • APA
  • MLA
  • Harvard
  • Vancouver

Masing-masing memiliki format tersendiri untuk penulisan rujukan maupun teknik menulis rujukan dalam teks. Bila rujukan yang anda gunakan, hanya lima atau 10, mungkin tidak masalah kita ketik manual. Tapi kalau jumlah rujukannya sudah 50 bahkan lebih, maka anda akan memerlukan aplikasi yang mendukung citation management.

Seperti yang telah sering saya jelaskan, bahwa komponen dari citation manager terdiri dari:

  1. aplikasi citation manager, misal: Zotero, Mendeley, EndNote,
  2. konektor ke perambah (/browse/r): anda bisa menggunakan Google Chrome, Mozilla Firefox, atau Safari,
  3. konektor ke pengolah kata: anda bisa menggunakan LibreOffice, TexStudio (LaTex), atau Microsoft Office.
  4. citation style language (csl) file: file ini ada yang sudah terinstalasi di dalam aplikasi citation manager, tapi anda akan perlu menginstalasi file csl tambahan untuk sesuai dengan permintaan jurnal tujuan anda menulis.

2 File CSL

CSL adalah singkatan dari Citation Style Language. Ini adalah file text (ASCII) yang dapat anda buka dengan aplikasi “Notepad” biasa. Sesuai namanya file ini menyimpan gaya sitasi. Biasanya nama filenya adalah nama akan sesuai dengan gaya sitasi, misal: hydrogeology-journal.csl.

Isinya kurang lebih adalah sebagai berikut:

<style xmlns="http://purl.org/net/xbiblio/csl" version="1.0" default-locale="en-US">
<!--
 Generated with https://github.com/citation-style-language/utilities/tree/master/generate_dependent_styles/data/springer 
-->
<info>
<title>Hydrogeology Journal</title>
<title-short>Hydrogeol J</title-short>
<id>http://www.zotero.org/styles/hydrogeology-journal</id>
<link href="http://www.zotero.org/styles/hydrogeology-journal" 
rel="self"/>
<link href="http://www.zotero.org/styles/springer-
xbasic-author-date" 
rel="independent-parent"/>
<link href=
"http://www.springer.com/cda/content/document/cda_downloaddocument/
Key_Style_Points_1.0.pdf" 
rel="documentation"/>
<link href="http://www.springer.com/cda/content/document/
cda_downloaddocument/manuscript-guidelines-1.0.pdf" rel="documentation"/>
<category citation-format="author-date"/>
<category field="science"/>
<issn>1431-2174</issn>
<eissn>1435-0157</eissn>
<updated>2014-05-18T01:40:32+00:00</updated>
<rights license="http://creativecommons.org/licenses/by-sa/3.0/">
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License
</rights>
</info>
</style>

Mumet bukan?

Dengan demikian nama csl harus diketahui sebelum anda mengunduhnya
dari repositori csl, dalam hal ini kita akan menggunakan citation
manager
Zotero yang repositorinya ada di tautan:
http://www.zotero.org/styles. Jadi daripada harus belajar programming
csl file, lebih baik, cari namanya kemudian unduh filenya.

zoterorepo

Figure 1: Tampilan repositori csl Zotero

3 Aliran kerja citation management

Secara sederhana, pengelolaan sitasi atau citation management dapat dijelaskan dalam diagram alir sebagai berikut. Komponen utamanya adalah:

  • aplikasi citation management,
  • connector to browser dan connector to word-processor.

Aliran kerja diawali dengan anda mencari rujukan yang anda perlukan melalui perambah/browser (anda dapat menggunakan Chrome, Firefox, atauapun Safari). Umumnya para pengguna akan mengunjungi situs database ilmiah sebagai berikut:

  • Google Scholar
  • Scopus
  • Sciencedirect
  • Proquest
  • dll

citmanage

Figure 2: Aliran kerja citation management

Saat anda menemuka makalah yang anda perlukan, selain mengundung pdf filenya, yakinkan anda mengunduh citation info (cari tombol “Export” atau “Export” citation), pilih format citation info yang sesuai dengan citation manager yang anda gunakan. Bila anda memilih format “BibTex” atau “text file”, maka umumnya semua aplikasi citation manager akan mengenalnya. Citation info adalah metadata dari setiap rujukan yang anda unduh. Nantinya, citation info akan di-import ke dalam aplikasi citation manager anda, hingga menghasilkan item sitasi yang siap untuk dirujuk dalam teks. Anda dapat pula menempelkan pdf file ke item sitasi yang bersangkutan.

Cara lainnya, adalah, bila anda menggunakan Zotero dan telah menginstalasi browser connector, anda dapat langsung meng-klik tombol bergambar “huruf Z” atau “folder warna biru”. Hasilnya akan muncul form pilihan sitasi mana yang diperlukan, atau klik saja “Select all”. Kemudian secara otomatis, browser connector akan mengunduh citation info dan pdf file (bila memang tersedia) dan memasukkannya ke dalam library di aplikasi Zotero. Akan lebih baik anda buka terlebih dahulu Zotero, pilih library atau koleksi target penyimpanan atau buat baru, sebelum anda melakukan langkah tersebut di atas.

Hasilnya dapat langsung anda lihat di koleksi Zotero anda. Semua citation info dan pdf file akan tersimpan di dalamnya. Mudah dan cepat bukan.

4 Contoh

Bila ada kasus seperti ini:

Dalam template tesis/disertasi ada format sitasi yang dibakukan. Pertanyaan: apa nama gaya sitasinya? Misal APA, MLA, Harvard? Atau merujuk ke jurnal-jurnal tertentu.

Bila anda menghadapi hal ini, setidaknya anda harus mengetahui bidang ilmunya, misal: apakah ilmu kebumian (earth sciences) atau ilmu alam (natural sciences). Bila ada sudah tahu, maka anda dapat mengunjungi repositori csl dari citation manager yang anda gunakan. Bila anda menggunakan Zotero, maka anda harus ke http://www.zotero.org/styles. Masukkan kata kunci bidang ilmunya, bila anda sorot tautan hasil pencarian, maka akan muncul preview dari gaya sitasi yang bersangkutan. Kemudian anda cocokkan saja dengan contoh yang diberikan.

Namun bila anda sudah melakukan hal di atas dan ternyata tidak ditemukan gaya sitasi yang mirip, maka mungkin bidang ilmunya keliru. Coba kata kunci yang lain. Cara lain bila sudah buntu, ya harus belajar bagaimana membuat csl sendiri. Pilih saja gaya sitasi yang paling mirip, kemudian anda edit. Saya belum dapat memberikan tutorial karena memang saya juga masih belajar.

Sebenarnya karena csl file adalah text file dan punya format baku, maka anda dapat bebas mengikuti tutorial yang banyak tersedia di dunia maya. Tapi kalau anda menggunakan Zotero, saya sarankan anda kunjungi tautan berikut: https://www.zotero.org/support/dev/citation_styles/style_editing_step-by-step.

Advertisements

Serial WTF

WTF2

 

(image from: writing.wikinut.com)

Serial “WTF

Mulai besok akan ada serial tulisan pendek di blog saya onlinewaterbook.wordpress.com.

Serial WTF (Writing is Totally Fun).

Semoga berlanjut dan ada yg “melamar” lg.

View on Path

WTF: Bagaimana Indonesia “ditemukan”? SEO for Academics

Dasapta Erwin Irawan,

Institut Teknologi Bandung

loupe

Gambar 1 Loupe dari flickr/alainbachellier

Tulisan pendek ini adalah lanjutan dari tulisan saya berjudul Mengangkat nama Indonesia dari tulisan. Bila ingin format pdf-nya, bisa mampir ke [sini]{http://goo.gl/9PJpWD). Kali ini saya akan bercerita tentang bagaimana Indonesia ditemukan. Karena saya bukan ahli sejarah, maka kata-kata tersebut mohon tidak diartikan secara harfiah.

1 Pendahuluan

Memang bagian ini tidak wajib. Tapi saya suka dan harus membuat pendahuluan.

1.1 Search Engines

Apakah search engine itu?

Search engine adalah Google. Itu untuk mudahnya. Ini sebenarnya adalah aplikasi yang bertugas mencari dan menganalis apa saja yang dimasukkan penggunakan di kolom pencari (search column).

Apakah ada selain Google?

Ada:

  • Microsoft punya Bing
  • Buat yang lahir tahun 70-80an pasti kenal Yahoo, Altavista (sudah hilang), dan Lycos. Sepertinya masih ada.

1.2 Scientific databases

Apa lagi ini?

Kalau anda ingin mencari secara spesifik material sainfitik, anda bisa menggunakan scientific databases. Dua diantaranya adalah:

1.3 Bagaimana aplikasi itu bekerja?

Kalau bertanya secara teknisnya, saya tidak bisa menjawab, karena bukan lulusan IT atau computer science. Yang jelas aplikasi-aplikasi tersebut akan mencari kata kunci yang telah dimasukkan oleh pengguna.

Mereka akan membuka database dan mencocokkan dokumen mana yang mengandung kata kunci itu.

1.4 Di bagian mana mencarinya?

Pertanyaan bagus. Di sinilah mulai saya bahas “bagaimana Indonesia ditemukan”

Let’s do a role playing.

2 Menemukan Indonesia?

Ready?…

Let’s do a role playing.

Sebut saja anda adalah mahasiswa S3 di salah satu perguruan tinggi (PT) di luar negeri (LN). Anda ingin meneliti tentang air tanah di Bandung, maka ia akan mencoba membuat literature review. Apa ini? Baca di sini, di sini, dan di sini.

Untuk itu ia mulai membuka beberapa database saintifik, sebut saja Google Scholar dan Scopus. Kalau ingin tahu lebih banyak, tentang Scopus bisa baca dan unduh slide saya di SlideShare.

Ia mulai mengetik beberapa kata kunci:

  • “air tanah Indonesia”
  • “air tanah Bandung Indonesia”
  • “air tanah Bandung”
  • dst

Apa yang sama dari ketiga kata kunci di atas? Lokasi bukan. Ia mungkin akan mencari informasi dalam skala Indonesia, kemudian turun ke skala Kota Bandung.

Apa yang ia harapkan muncul? Makalah ilmiah yang judulnya mengandung kata-kata Bandung dan atau Indonesia bukan.

Jadi begitu besar pengaruh menyebut lokasi dalam judul. Kalau anda menulis apa saja, yakinkan bahwa anda sudah menyebut lokasi dalam judul.

Di mana lagi kata-kata Indonesia ditemukan oleh mesin pencari?

Mungkin akan ditemukan di bagian abstrak. Jadi saat menulis yakinkan ada lokasi dalam abstrak anda.

Di mana lagi?

Di bagian kata kunci (keywords) yang biasanya di bawah abstrak.

Ada lagi?

Ya, di bagian afiliasi penulis. Kalau anda lihat makalah ilmiah, maka afiliasi penulis biasanya tertulis setelah nama penulis atau kadang di bagian bawah kiri halaman (lihat gambar berikut).

Paper

Gambar 2 Anatomi paper. Diambil dari akun ResearchGate saya

3 Bagaimana kondisi saat ini?

3.1 Jumlah publikasi ilmiah

Bagaimana kondisi saat ini? Berapa banyak paper atau makalah yang ditulis oleh orang Indonesia atau orang-orang yang berafiliasi Indonesia?

Saya sampaikan saja hasil kompilasi dari database Scopus oleh Prof. Hendra Gunawan (Guru Besar Matematika ITB) dalam tweetnya berikut ini.

tweet

Gambar 3 Daftar peringkat perguruan tinggi produktif dalam membuat paper (menurut database Scopus)

Mohon tidak melihat institusi, namun lihatlah Indonesia secara keseluruhan. Masih kalah jauh bukan dengan tetangga sendiri, Malaysia.

Oya, hasil tersebut sangat mungkin akan berbeda bila kita menggunakan database Google Scholar (GS) atau Microsoft Academics (MA).

Kenapa?

Karena Scopus utamanya mencari informasi berjenis peer-reviewed paper atau prosiding seminar yang didaftarkan ke Scopus, sementara pencarian GS dan MA tidak hanya pada dua jenis paper tersebut.

3.2 Jumlah mahasiswa Indonesia di luar negeri

Berapa jumlahnya? Mari kita lihat informasi berikut.

 Program ini merupakan program panjang yang berkelanjutan oleh LPDP, dimana setiap tahunnya mereka memberangkatkan 3.000 putra putri terbaik bangsa.

"Tahun 2015 nanti diperkirakan jumlah lulusan yang pulang mencapai 900 orang, dan target yang diharapkan pada tahun 2030, LPDP melahirkan 60.000 pemimpin bangsa," ujar Direktur LPDP, Eko Prasetyo.

Dikutip dari laman Facebook LPDP

Jumlah di atas hanya dari Beasiswa LPDP. Masih ada banyak beasiswa lainnya, dari

  • Dikti,
  • Biro Kerjasama Luar Negeri Dikbud (maaf masih menggunakan nama kementerian yang lama karena sering gonta-ganti),
  • dll.

Sekarang apa hubungannya jumlah mahasiswa di luar negeri?

Akan saya jelaskan. Sabar.

3.3 Jumlah mahasiswa di LN vs jumlah publikasi?

Bukankah bagus banyak anak muda Indonesia menuntut ilmu di luar negeri.

Apa hubungannya dengan jumlah publikasi yang rendah?

Saya tidak menyangsikan dampaknya kepada Indonesia. Saya hanya akan menyoroti satu elemen saja kegiatan ilmiah mahasiswa kita di LN.

Kegiatan apa itu?

Menulis makalah ilmiah.

Apa pula masalahnya?

Ingat gambar anatomi paper (Gambar 2) dan ingat pula bagaimana mesin pencari menemukan Indonesia. Salah satunya adalah di afiliasi penulis. Mari kita lihat beberapa kemungkinan berikut:

  • Kasus no 1: Sang penulis adalah mahasiswa di PT dalam negeri (DN) dengan lokasi penelitian di LN.

Tidak perlu dibahas, karena jarang sekali terjadi.

  • Kasus no 2: Sang penulis adalah mahasiswa di PT dalam negeri (DN) dengan lokasi penelitian di Indonesia.

Maka mestinya ia akan menuliskan kata-kata Indonesia pada bagian judul, abstrak, dan kata kunci makalah.

Maka Paper ini akan muncul dalam pencarian dengan kata kunci seperti di atas.

  • Kasus no 3: Sang penulis adalah mahasiswa di PT LN dengan lokasi penelitian di Indonesia

Maka ia akan menuliskan kata-kata Indonesia pada bagian judul, abstrak, dan kata kunci makalah.

Maka Paper ini akan muncul dalam pencarian dengan kata kunci seperti di atas, tapi tidak akan menambah jumlah paper berdasarkan institusi dalam daftar di Gambar 3.

Lho kenapa? akan saya jelaskan.

  • Kasus no 4: Sang penulis adalah mahasiswa di PT LN dengan lokasi penelitian di LN

Ini sangat sering terjadi.

Maka ia tidak akan menuliskan kata-kata Indonesia di bagian manapun dalam papernya.

Maka paper tersebut tidak akan muncul dalam pencarian dengan kata kunci seperti di atas.

4 Libatkan penulis yang berafilisasi lembaga di Indonesia

Kita akan fokus ke kasus no 3 dan 4. Pada kasus no 3, paper akan muncul dalam pencarian dengan kata kunci Indonesia tapi tidak akan menambah daftar pada Gambar 3, karena afiliasi yang tertulis dalam paper pasti afiliasi PT LN.

Untuk kasus no 4, paper hanya akan muncul bila penulis mencantumkan afiliasi lembaga Indonesia.

Bagaimana caranya?

Saat yang terpikir hanyalah melibatkan penulis dengan afiliasi lembaga di Indonesia. Idealnya si penulis tambahan ini harus bekerja di lembaga yang milik pemerintah Indonesia yang berlokasi di Indonesia. Ajak mantan pembimbing anda atau rekan anda yang sedang menuntut ilmu di Indonesia.

Tidak ada salahnya bukan.

Apakah supervisor anda di LN setuju?

Asal anda menyampaikan dengan baik, tidak ada alasan buat profesor itu untuk menolak. Selama urutan penulisnya betul.

** Kenapa?** baca tulisan saya tentang [Authorship}(http://goo.gl/yQZ9Tr) di sini dan di sini.

Jadi letakkan nama mitra penulis tambahan ini sesuai perannya, yang mestinya (most likely) akan jatuh di paling belakang. Tidak masalah bukan.

Dengan cara ini mesin pencari akan menemukan paper anda, one way or another.

5 Penutup

Begitulah cerita pendek ini. Cerita tentang Menemukan Indonesia. Mohon maaf bila masih banyak kesalahan (typos) dan kekurangan di sana-sini. Maaf juga gambar berukuran jumbo masih belum di-resize.

“Maklum masih draft 1”, kata mahasiswa saya yang sedang tugas akhir.

Salam, Erwin Institut Teknologi Bandung Find me on twitter (@dasaptaerwin)

Naskah ini dibuat dengan R-markdown.

Visor your supervisor

Author: Dasapta Erwin Irawan
 
supervisor
(image from: http://katdaley.blogspot.com)
 
One of the major role in your thesis is the role of your supervisor. Unfortunately supervisors are just like other ordinary persons who happen to have job and obligation to supervise your thesis work. So by referring them as persons, I meant they won’t be always at your side supporting you with a wide smile on their face. At many times you would find them just stood there with no instruction nor suggestions on how you do your thesis. So just bare with them, because they are one of your chance to get your degree.
 
Basically there are two types of supervisors, based on their time commitment:
  1. The busy ones: They are the kind you are likely to meet. 
  2. The not (yet) busy ones: They are mostly early time researchers at your uni. They would probably be the assistant of the busy one. If you have this kind as one of your supervisor, then you are lucky. But not for long, these early researchers won’t be staying not busy for long as they also find their way to upgrade their career.
Then if you divide them to how they like to communicate to their students, you would find the following two:
  1. The tech savvy ones: They can use technology in their work. This kind of supervisor uses emails, Skype sessions, and social medias to communicate their science as well as to reach out to their students. Opening pdfs, reading slides, compiling LaTEX script are not their biggest concern when dealing with you. Your research is number one. If you have this kind of supervisor. You are lucky, because they most likely can communicate with you whereever and whenever. A casual conversation might be their style, with a bit of a drawback for you, their messages can be sent in commong working hours. Unsuspected meeting place would also be one of your challenge to meet him or her, e.g: parking lot, airport, train station or even a dark alley. You must be aware 24/7. 
  2. The non tech savvy ones: at first you will have a major headache when dealing with this kind of supervisor. You would need to have a formal-conventional meeting with your supervisor. A structured notes and materials are probably your main asset. Your might need to copy him/her weekly meeting schedule to make an appointment. Prepare your explanation carefully and develop a note taking technique, as this supervisor may be a fast talker and have very limited time for you. A formal conversation would be their style. 

No matter which kind of supervisor you have, you must be a quick learner to adapt his or her style. Remember to develop your verbal and writing communication skill.

 
Good luck.

Several more things about data analysis

Outline

Part 1: Data in business

Author: Dasapta Erwin Irawan

1.1 Introduction

In this online era, we are surrounded by something called data. Back then, data is only considered to be related to laboratory works, school projects, etc. Now, data is all around us. Data was only something we measure, but now it is something we trade as goods. People is interested to anything that can be converted to data observation or measurement. Some says data is the by product of digital existence. People tend to analyse anything. They even interested to the rise and fall of the name “Jennifer” being used as girl’s name through out time. Per say, data is now an everyday talk in coffee shops. Or maybe in the wet market, when people talks about the rise and fall of cabbage price.

1.2 Data in business: why is it so important

1.2.1 Forecasting: we need to know the future

What’s all the excitement about data analysis. Forecasting is one thing. People always need to know what happen to the future, given with the existing condition as baseline with some assumptions and chances of disruption along the way. Forecasting is one of the main part of business. So important, that a business proposal would likely to be thrown away if it does not contain any data-based forecasting.

A time series is a collection of measurements of well-defined items obtained through repeated measurements over time. For example, measuring the value of retail sales each month of the year would comprise a time series. This is because sales revenue is well defined, and consistently measured at equally spaced intervals. Data collected irregularly or only once are not time series. A time series can be decomposed into three components: trend component (long term direction), seasonal component (systematic, calendar related movements) and irregular (unsystematic, short term fluctuations). The decomposition is important in data analysis. Because what we see in the chart doesn’t necessarily happen in real life. For instance, let’s see the chart of turkey sale. It would likely to be high around Thanksgiving. So our first guess would be it’s a cyclic phenomenon. But what if there’s a shift in the time range in a certain year. We wouldn’t expect that the data of Thanksgiving had been shifted. What if there’re more than two peaks of turkey sale in a year, and so on.

Let’s think of data as something that has its own behaviour. It may have a rhythmic natural-born behaviour or erratic. It may have a stiff and dull attitude, that insensitive to external influence, or it may have a flexible nature and very sensitive to outside parameters. Or, it just may have an erratic behaviour without having any major controlling parameter. As you can see at the following picture, forecast is part of a loop. It analyses and transforms performance into decision making inputs. This loop drives the evolution in a business model or organisation.

forecast

Fig 1 A business cycle involving forecast in the loop (from: University of Baltimore web page)

1.2.2 Decision making: reaction to every action

There is reaction to every action

I’m not into physics by that, as probably most of you too, but like an organisation, business is always changing. They evolve through the test of time. The Apple Corp now we see, is not the same as it was back in the 70’s. Or as oppose to the fore-mentioned proprietary vendor, let’s see the open source software. Let’s say Linux, an open source operating system. Firstly built as a personal computer science project by a personal Finnish student, namely Linus Torvalds, Linux now is a multi billion dollar business. You can see that a technology that started as a free for all technology, is being transformed to profit-oriented object. The surprising news is, the free-for-all Linux is still marking its way along with its commercial side. This product can change the way people see free stuff. The operating system itself is still free until now, but it is the service of building based on time series data. The operating system itself is still free until now, but it is the service of building and maintaining the Linux system that is highly commercial. See, that’s evolution and it involves data.

In the next article, we are going to talk about how we can see correlation between parameters in a data set, how we build a model, and then in the last article, we will discuss about a new profession called data scientist will be discussed.

Part 2: Looking for correlation

We have discussed how data can change the form of a business or an organisation. All the changes that might happen are based on data forecast. Now we’re going to talk about the second reason why data is important in business. Instead of only seeing a time series chart, people also needs to know what correlation can be drawn from it. Let’s just use an example.

A simple correlation case was brought by Bryant and Smith in a paper entitled Practical Data Analysis: Case Studies in Business, in 1995. They showed a case of data set containing measurements taken on dining parties in a restaurant by a single waiter. The variables include total bill (\(), tip (\)), gender of the bill payer, day of the week, and the tip as a percentage of the total bill. They wanted to see what variable or variables has or have the strongest influence to total tip in a week. They also compared the tip from male and female customer.

We can see in the chart, that total bill size and tip are positively associated (upper left scatter plot), but not as strongly as one might expect because there is increasing variability in tip as bill increases. Both tip and total bill have skewed distributions (upper left histograms), which might lead the analyst to consider log-transforming these variables.

Males spend more on average than females and bills are higher on the weekend (shown in the side-by-side box-plots). The 70% tip on a very small bill by a male on a Sunday may be an outlier. Much can be learned about tipping behaviour by studying this chart.

waiter

Fig 2 An example of correlation chart of multiple variables (from: National Library of Australia)

But we must put into account that correlation doesn’t always mean causation. If we see the above-mentioned case, indeed there’s a correlation between male and female customer and their tipping behaviour. But what drive the attitude had not been discussed yet. It generally lie underneath the number, that we have to dig out.

Many studies are actually designed to test a correlation, but not a causation. In general, it is extremely difficult to establish causality between two correlated observations, but on the other hand, there are many statistical tools to establish a statistically significant correlation.

You would be surprise how common sense conclusions about cause and effect might mostly be wrong. That is because a correlation can be due to two frequent correlated occurrences. Or a correlation may also be observed when there is a strong causality behind it, for example, it is well-known that cigarette smoking not only correlates with lung cancer, but actually causes it. But the hardest part is, in order to establish cause, we would have to rule out the possibility that smokers are more likely to live in urban areas, where there is more pollution — or any other possible explanation for the observed correlation.

So we can say that, causality can be started from series correlations. But we have to add some controlled variable in the analysis. As shown in the smoking example. We have to set the assumptions and narrow down the potential governing variables. We call the result as a model.

In the 3rd and 4th part of this “Data Talks” article, we are going to talk about “Model” and “Data Scientist”,.

Part 3: How to build a model

The model is the most basic element of the scientific method. And business is just as close as physics in science. Probably without noticing, we’ve talked about “model” in the previous “Forecast” and “Decision Making” parts. Both terms are brought by mathematical models in form of equation. You must know about linear regression (see the following figure) or remembered learning this subject in algebra. It is just one model among many others that does the actual forecasting for us.

linear

Fig 3 An example of linear regression model (from: National Library of Australia)

We also talk about model when we saw the business loop diagram in previous article or if we buy our children Hot Wheels or Barbie. Even a recipe is a model. So we could say a model as an simplification for what we are actually studying or trying to predict.

This is how we build a model:

  1. Data gathering. We talked about it in the forecast article. It can be a long time series, as long as, a rainfall data set, or from a questionnaire.
  2. Setting the assumptions. Most model only work in a controlled environment. Therefore we have to set the boundaries. The more boundaries, the more narrow our model will be. How many boundaries we should have? Answering this could be an itterative process with step number 3 and 4.
  3. Model fitting. This is the fun part. We can use major proprietary software like Stata, SPSS, and SAS, or you can choose the free one, like R. Those software contain many equation models that we can pick and test later on.
  4. Model calibration. This part also automatically done by softwares. Basically, we apply our chosen equation to a new data. If the result behave the same way with our modelled-data, the one we used in step number 3, then we can say our model is actually working. If not, then we have to go back to step no 3 or even number 2.
  5. Model application. This is the phase that we like the most. But, through time, we have to evaluate our model, based on the current situation.

Another thing we have to bare in out mind is, the Law of Simplicity. The simplest model has higher chance to be received in business environment. Top executive would probably put less care about model with 11 variables. Two or three variables model is frequently chosen by a data scientist of previously known as data analyst. In the 4th part of this “Data Talks” article, we are going to talk about “Data Scientist”, a new blossoming career for mathematicians, statistician or computer scientists.

Part 4: Data scientist

It was not until five years a go, people invented a new kind of profession, called “data scientist”. A data scientist represents an evolution from the business or data analyst role. A solid basics typically in computer science and applications, modelling, statistics, analytics and math. We are talking about a one powerful career that can predict the future, talk about it, and persuade others. A good data scientists will not just address business problems, they will pick the right problems that have the most value to the organisation.

datasci

Fig 4 A profile of data scientist (from:Emc^2 web site)

The work of a data scientist would more or less cover the following aspects (extracted from a coursera forum):

  • Formulate context-relevant questions and hypotheses to drive data scientific research
  • Identify, obtain, and transform a data set to make it suitable for the production of statistical evidence communicated in written form
  • Build models based on new data types, experimental design, and statistical inference

Aside to the proficiency in computer science, math and statistics, a good data scientist must have the curiosity, creativity, focus and attention to detail.

Data scientist is always needed as far as there’s data involve in an operation. Companies that hire data scientist include:

  • Construction companies
  • Utility companies
  • Oil, gas and mining companies
  • Hospitals and health care organisations
  • Colleges and universities
  • Federal, provincial/state and municipal government departments
  • Transportation companies
  • Telecommunications companies
  • Insurance, finance and banking organisations
  • Management consulting companies
  • Manufacturing companies

As we conclude our talk on data, it’s clear that

Numbers are not just numbers

They can speak

And it’s up to us to listen

 

 

++blogging since 2007 and still++

image image

Hello world

Tidak terasa sudah blogging sejak tahun 2007. Awalnya hanya iseng karena saat itu platform wordpress sedang meroket. Awalnya pasti post yang setara dengan “Hello World” berkembang ke post yang sifatnya lebih akademik and off course otomotif, khususnya retro cars. Tapi satu topik yang tidak dan tidak akan pernah saya tulis, yakni “politics”. Padahal dulu saat SMA saya sangat senang dengan buku humor “Mati Ketawa cara Rusia”. Well that was only geeky me talking 😊. Sekarang blog saya yang pertama sudah punya pengujung lebih dari 55 ribu. Dan anehnya kalau dari statistik app WordPress dapat saya lihat bahkan artikel yang paling lawas pun masih dibaca orang. Saya menulis Facts and Fiction tentang brand mobil favorit saya “Peugeot”. Sepertinya ada yang hilang kalau saya tidak posting (tidak hanya di blog) sesuatu. Walaupun hanya posting pendek.

You are what you blog

The quote is really true. Saya tidak ingin dikenal sebagai komentator politik. Saya hanya ingin berbagi pengetahuan tentang hidrogeologi, ilmu secara umum, pendidikan, dan mungkin beberapa tutorial aplikasi. Jadi kalau anda lihat post tentang presidential election (baca: copras-capres) dalam blog saya yang manapun. Post saya lebih banyak akademik dan bagaimana memotivasi kolega-kolega saya yang saat ini masih mahasiswa. Satu hal yang tidak saya dapat saat saya masih mahasiswa. Mungkin karena dulu jaringan internet di ITB belum semaju sekarang. Internet batu dikenal di warung makan pinggir jalan 😊. Bisa menebak makanan apa itu?

Write and brush your teeth

Menulis buat saya (atau baca: mengetik) sudah seperti kebiasaan seperti menggosok gigi. Coba cari posting saya tentang “menggosok gigi”. Kenapa saya posting hal itu? Karena memang kalau kita sudah sangat terbiasa dengan satu kegiatan, maka kita lupa mengapa atau bagaimana caranya. Persis seperti gosok gigi. Otomatis gerakan tangan kita adalah ke kanan-kiri dan atas-bawah. Gerakan tangan kita sangat terkontrol. Kenapa harus begitu? Pasti sudah lupa.

Seperti juga dalam menulis, memilih kata, menghubungkan kata-kata menjadi kalimat, menghubungkan kalimat menjadi paragraf, menghubungkan paragraf menjadi artikel lama-lama akan menjadi kebiasaan. Kemudian tidak terasa ternyata kita sudah menulis 200 kata (kira-kira sepanjang abstrak tugas akhir). Tangan kita sudah terbiasa menulis tanggal, judul, section sampai sudah tidak tahu lagi how did we come up with it in the first place. Kata orang, setelah terbiasa menulis, tahap berikutnya adalah terbiasa menceritakan kenapa kita menulis yang kita tulis.

Just write then format later on

Awalnya kita perlu membuat outline sebelum menulis. Lama-kelamaan justru outline keluar belakangan. Tangan anda sudah menulis (atau tepatnya mengetik), begitu cepat sampai kemudian anda akan berhenti untuk menyusun outlinenya. It does not matter Whichever way you use as long as you can write.

Penyebab lainnya mengapa lebih baik mengetik acak (random writing) adalah karena otak kita lebih sering bekerja acak. Terutama saat idle (baca: melamun). Oleh karenanya saya selalu bawa buku tulis ke mana-mana. Apalagi sekarang saat sudah tidak pernah lagi mengemudi. Dalam perjalanan ke kampus adalah waktu yang sangat pas untuk brainstorming. Satu lagi yang teringat adalah, waktu produktif dalam menulis akan berbeda-beda untuk tiap orang. Dalam kasus saya, pukul 04.00-06.00 pagi adalah yang tepat. What can I say, I’m a morning person. Jadi ketik saja apa yang sedang lewat di pikiran anda. Setelah anda lebih “sadar” baru diformat dan distrukturkan.

Simple #bibliometric with Microsoft Academic using #R

#R: Simple #bibliometric comparation

**Using: Google Scholar (GS), Microsoft Academic (MSA), Scopus (SCP), and Web of Science (WOS)

Table of Contents

 

Introduction

All kinds of research, researcher must have a strong understanding of preceeding research on the same or similar subject. Master and PhD student, as a kind of researcher, must compose a literature review before they hold permit to start their research. Usually we use the term literature review as a form of formal written document that summarises all previous related researches.

The general steps are:

  • searching articles with certain criteria.
  • published article on reputable journals.
  • presented abstract on reputable conferences.
  • extract the results from each article, what data is used in it, and how the author analyse it.
  • summarise and compile the result to mark a baseline for your research.

However if we dig deeper, we can find that there are at least two kinds of literature review:

  • Annotated bibliography: What is an annotated bibliography? These are several good definitions on the term:

An annotated bibliography provides a brief account of the available research on a given topic. It is a list of research sources that includes concise descriptions and evaluations of each source.UNSW

Another definition even gives an average sum of words:

An annotated bibliography is a list of citations to books, articles, and documents. Each citation is followed by a brief (usually about 150 words) descriptive and evaluative paragraph, the annotation. The purpose of the annotation is to inform the reader of the relevance, accuracy, and quality of the sources cited. Cornell Univ.

  • Systematic review: [I will add this later on]

Hands on

Now we get to the real part, searching for references. There are so many ways to get related readings and references. The old-fashioned way is to go to your university library. Tempting huh 🙂 I would suggest this as the best way. Not only you will get the one document that you have been looking for, but also you will feel the atmosphere in there. Although there are more online documents nowadays, but still I would sit still nlibrary (if I have time). You might by any chance get the oldest record on whatever you are looking for. Then there is always be internet as the backbone of researcher around the globe. The problem is, where to find it.

  • Search Engine: Google, the most obvious man???s best friend. Off course there are others, like: Bing, Microsoft Academic and our old mate Yahoo. You might want to visit list of search engine. But be careful with using Google, because it crawls on any documents that matched with our keyword. So it could be a real scientific paper on a scientific journal, or a newsletter or simply an email in a miling list. But starting from November 2004, Google has make improvement on the matter by launching Google Scholar. Now you can get more refined result with this tools. Five years later, in December 2009, Microsoft launched Microsoft Academic. Citation database or scientific database: we???re already familiar with Scopus, Science direct, Proquest, or Web of Science. You can start with both links, since different company would likely have different database and searching algorithm. If you are working or affiliating to a university that has subscription to any of the database, then you have eliminated half of your problem :-).
  • Citation database or scientific database: we are already familiar with Scopus, Science direct, Proquest, or Web of Science. You can start with both links, since different company would likely have different database and searching algorithm. If you are working or affiliating to a university that has subscription to any of the database, then you have eliminated half of your problem :-).
  • Or your university has a cross-referencing system that access multiple databases in the internet. You are the lucky one :-). Just type in the keyword in it then you get more results from multiple resources. I???ll continue later on with my own case of reference searching.

Google Scholar

add the result later

Microsoft Academics

Following my previous post on simple bibliometric with GS Google Scholar, this time I try to do the same steps with MSA Microsoft Academic. The pros in using MSA is that it offers categorization of scientific entries. This is not available with GS. In this post I tabulated and compared each category with several keywords. Here I used the following keywords:

  1. West Java
  2. Bandung
  3. Citarum
  4. Cikapundung
  5. Groundwater Bandung
  6. Groundwater Citarum
  7. Groundwater Cikapundung
  8. Health Bandung

The following list contains the categories that automatically built by MSA:

  1. Agriculture Science (agsci)
  2. Arts & Humanities (arthum)
  3. Biology (bio)
  4. Chemistry (chem)
  5. Computer Science (comsci)
  6. Economics & Business (ecobus)
  7. Engineering (eng)
  8. Environmental Sciences (envsci)
  9. Geosciences (geosci)
  10. Mathematics (math)
  11. Material Science (matsci)
  12. Medicine (med)
  13. Multidisciplinary (muldis)
  14. Physics (phy)
  15. Social Science (socsci)

I worked around this with the following codes.

# load library
library("lattice")
library("gridExtra")

I use LibreOffice to prepare the data. Basically every keyword consists of 15 observations (see the result from head(bib)).

# load data
bib = read.csv("20140523b-summary references.csv", header = T)
head(bib)
##   no               fields2 fields     key dbase sum
## 1  1  Agriculture Science   agsci Bandung msacd  16
## 2  2    Arts & Humanities  arthum Bandung msacd  44
## 3  3              Biology     bio Bandung msacd 129
## 4  4            Chemistry    chem Bandung msacd 153
## 5  5     Computer Science  comsci Bandung msacd 406
## 6  6 Economics & Business  ecobus Bandung msacd  44

I did the subsetting for each keyword.

# subsetting data
bib.wj = subset(bib, bib$key == "West Java")
bib.bdg = subset(bib, bib$key == "Bandung")
bib.ctr = subset(bib, bib$key == "Citarum")
bib.ckp = subset(bib, bib$key == "Cikapundung")
bib.gwbdg = subset(bib, bib$key == "Groundwater Bandung")
bib.gwctr = subset(bib, bib$key == "Groundwater Citarum")
bib.gwckp = subset(bib, bib$key == "Groundwater Cikapundung")
bib.healthbdg = subset(bib, bib$key == "Health Bandung")

I used lattice and gridExtra package for plotting. You may use another package, but you have to change the codes.

# plotting
plot1 = xyplot(bib.wj$fields ~ bib.wj$sum, pch = 21, fill = "red", xlim = c(0, 
    8000), main = "key: West Java")
plot2 = xyplot(bib.bdg$fields ~ bib.bdg$sum, pch = 21, fill = "red", xlim = c(0, 
    8000), main = "key: Bandung")
plot3 = xyplot(bib.ctr$fields ~ bib.ctr$sum, pch = 21, fill = "red", xlim = c(0, 
    8000), main = "key: Citarum")
grid.arrange(plot1, plot2, plot3, ncol = 3)

plot1


plot4 = xyplot(bib.gwbdg$fields ~ bib.gwbdg$sum, pch = 21, fill = "red", xlim = c(0, 
    50), main = "key: Groundwater Bandung")
plot5 = xyplot(bib.gwctr$fields ~ bib.gwctr$sum, pch = 21, fill = "red", xlim = c(0, 
    50), main = "key: Groundwater Citarum")
plot6 = xyplot(bib.healthbdg$fields ~ bib.healthbdg$sum, pch = 21, fill = "red", 
    xlim = c(0, 50), main = "key: Health Bandung")
grid.arrange(plot4, plot5, plot6, ncol = 3)

plot2


plot7 = xyplot(bib.ckp$fields ~ bib.ckp$sum, pch = 21, fill = "red", xlim = c(0, 
    10), main = "key: Cikapundung")
plot8 = xyplot(bib.gwckp$fields ~ bib.gwckp$sum, pch = 21, fill = "red", xlim = c(0, 
    10), main = "key: Groundwater Cikapundung")
grid.arrange(plot7, plot8, ncol = 3)

plot3

Scopus

add the result later

Web of Science

add the result later

Note:

  • OS : Ubuntu 13.10
  • R studio Version : 0.98.507
  • R base Version : 3.1.0 (2014-04-10)

 

 

The one with the GTD (Get Things Done)

I don’t mean to add another GTD (Get Things Done) system in your library. There are so many who share their GTD  system, that probably you have applied some of them. I just share my daily activities, as well as my new habit in Sydney. If something might interest you, drop your comment or send me email. I saw this GTD by Jamie Todd Rubin (I’m going to look for the link later), and made this short article. 

Currently I live in Sydney. I’m planning to stay here for one year,  accompanying my wife pursuing her PhD, and having sabbattical leave from ITB (Institut Teknologi Bandung). However, I also have collaborative research activities (the fun part) with Dr. Willem Vervoort from Department of Environmental Sciences, University of Sydney. My academic activities will take place in the Australian Technology Part area (near Redfern train station).

Here’s my daily GTD, and I will complete this post when I have time (scratches of time, if I may say). 

  • My computers and mobile devices
    • Laptop: Macbook Air (13,1” and 128 GB model) (Circa 2012)
    • Phone: iPhone 5s (Black-Space grey). The recently-replaced iPhone was a 3 yo (3Gs). I was moving away from Blackberry since then :-).  
  • Other gadgets I use regularly
    • I used to record my voice note using Sony Digital Recorder (1 gigs), but I use my iPhone now to do that.
    • I’m still using “old skool” classic notepad to write whatever, mainly my to do list. 🙂 
  • Frequently-used apps
    • Google Apps: I practically live on Google Apps system. 
      • Gmail: I use it on my MB (with Google Chrome) and on my iPhone (with standard mail app).
      • Google Calendar: I put all of my todo list and activities on my Cal (iPhone) which is connected to the Google Cal app. This is how my devices stay connected.
      • Google Docs: I use this app when I’m connected to the net, searching for some references. I always open the pdfs, docs/ppts/xls (opendable docs) with Google Docs, so that I don’t clutter my MB HDD with files that I don’t need (read: junks) . It will automatically saved in my Google Drive account. Therefore I can seek them again whenever I need.
      • Google Notes: I always my “Notes app” in my iPhone just to take note on whatever crosses my mind. The app is connected to my Google and iCloud acc, then my devices will record the same note. I can access them later on, when I need to develop it. Last but not least, do to the limited internet connection, I often use the fast line at the office, instead of at my apartment. In offline environment I always use Growly Notes (it pastes anything on your copy-path) to store my ideas, for retrieving later on. 
      • Office app: My main office app is LibreOffice. Because of its free (offcourse), and save all docs in default open document formats (to make sure it won’t catch viruses). Only a few docs I save in Microsoft formats, the selected ones that I think it is going to need collab with others (colleague) that use Ms Office.  I always share my work in pdf formats (offcourse reduce the file size before send it). 
      • Evernote: I call it the souvenir from the 21st century, and it’s free!!! I use it in my Googling journey, make article clips, page clips of what I found interesting. I often take snapshots of my notes and my sketches (from my notepad) and make a few bits of thoughts afterwards. Almost forget, I always share my “discoveries” to my students, using Evernote. I send out my article links to them so that they know what I’m thinking, and what references to use, or just light chit chat to ease their.

(this pic is a typical Growly Notes view. It was the first draft of this article)

Image

(below is the opening page of Evernote)

Image