Life is thickly sown with thorns, and I know no other remedy than to pass quickly through them. The longer we dwell on our misfortunes, the greater is their power to harm us.

Voltaire

Some Corpora Available Online

  • CHAINS: Characterising Individual Speakers CHAINS is a research project funded by Science Foundation Ireland from April 2005 to March 2009. Its goal is to advance the science of speaker identification by investigating those characteristics of a persons speech that make them unique.
  • The Blog Authorship Corpus The Blog Authorship Corpus consists of the collected posts of 19,320 bloggers gathered from blogger.com in August 2004. The corpus incorporates a total of 681,288 posts and over 140 million words – or approximately 35 posts and 7250 words per person.
  • Old Bailey Corpus The proceedings of the Old Bailey, London’s central criminal court, were published from 1674 to 1913 and constitute a large body of texts from the beginning of Late Modern English. The Proceedings contain over 200,000 trials, totalling ca. 134 million words and its verbatim passages are arguably as near as we can get to the spoken word of the period. The material thus offers the rare opportunity of analyzing everyday language in a period that has been neglected both with regard to the compilation of primary linguistic data and the description of the structure, variability, and change of English.
  • CoRD | Corpus of Early English Medical Writing (CEEM) The Corpus of Early English Medical Writing is a corpus of English vernacular medical writing. Consisting of three diachronically divided subcorpora, the corpus covers the entire history of medical writing in English from the earliest manuscripts to the beginning of modern clinical medicine.
  • The York-Toronto-Helsinki Parsed Corpus of Old English Prose (YCOE) The York-Toronto-Helsinki Parsed Corpus of Old English Prose (YCOE) is a 1.5 million word syntactically-annotated corpus of Old English prose texts. As a sister corpus to the Penn-Helsinki Parsed Corpus of Middle English (PPCME2), it uses the same form of annotation and is accessed by the same search engine, CorpusSearch. The YCOE was created with a grant from the English Arts and Humanities Research Board (B/RG/AN5907/APN9528). The corpus itself (the annotated text files) is distributed by the Oxford Text Archive and can be obtained from them free of charge for non-commercial use.
  • ARCHER Corpus (The University of Manchester) ARCHER is a multi-genre corpus of British and American English covering the period 1650-1990, first constructed by Douglas Biber and Edward Finegan in the 1990s. It is managed as an ongoing project by a consortium of participants at fourteen universities in seven countries.
  • Santa Barbara Corpus of Spoken American English The Santa Barbara Corpus of Spoken American English is based on a large body of recordings of naturally occurring spoken interaction from all over the United States. The Santa Barbara Corpus represents a wide variety of people of different regional origins, ages, occupations, genders, and ethnic and social backgrounds. The predominant form of language use represented is face-to-face conversation, but the corpus also documents many other ways that that people use language in their everyday lives: telephone conversations, card games, food preparation, on-the-job talk, classroom lectures, sermons, story-telling, town hall meetings, tour-guide spiels, and more.
  • The Newcastle Electronic Corpus of Tyneside English (NECTE) The Newcastle Electronic Corpus of Tyneside English (NECTE) is a corpus of dialect speech from Tyneside in North-East England. It is based on two pre-existing corpora, one of them collected in the late 1960s by the Tyneside Linguistic Survey (TLS) project, and the other in 1994 by the Phonological Variation and Change in Contemporary Spoken English (PVC) project. NECTE amalgamates the TLS and PVC materials into a single Text Encoding Initiative (TEI)-conformant XML-encoded corpus and makes them available in a variety of aligned formats: digitized audio, standard orthographic transcription, phonetic transcription, and part-of-speech tagged. This website describes the NECTE corpus in detail, and makes it available to academic researchers, educationalists, the media in non-commercial applications, and organisations such as language societies and individuals with a serious interest in historical dialect materials.
  • The Limerick Corpus of Irish English The Limerick Corpus of Irish English (L-CIE) has been developed by the University of Limerick in conjunction with Mary Immaculate College, Limerick. This one-million word spoken corpus of Irish English discourse includes conversations recorded in a wide variety of mostly informal settings throughout Ireland. The corpus is a collection of naturally occurring spoken data from everyday Irish contexts. There are currently 375 transcripts (totaling over 1,000,000 words) available at this site.
  • American National Corpus The American National Corpus (ANC) project is creating a massive electronic collection of American English, including texts of all genres and transcripts of spoken data produced from 1990 onward. The ANC will provide the most comprehensive picture of American English ever created, and will serve as a resource for education, linguistic and lexicographic research, and technology development.
  • Great Britain (ICE-GB) @ ICE-corpora.net The British component of ICE is based at the Survey of English Usage, University College London. The British ICE corpus (ICE-GB) was released in 1998 and is now available. The corpus is POS-tagged and parsed.
  • WebCorp: The Web as Corpus WebCorp LSE is a fully-tailored linguistic search engine to cache and process large sections of the web.
  • WaCKy “We are a community of linguists and information technology specialists who got together to develop a set of tools (and interfaces to existing tools) that will allow linguists to crawl a section of the web, process the data, index and search them. We try to keep everything very laid-back and flexible (minimal constraint on data representation, programming language, etc.) to make it easier for people with different backgrounds and goals to use our resources and/or contribute to the project. We built a few corpora you can download, and in the near future we’ll have a web interface for direct online use of the corpora.
  • VISL – Corpus Eye VISL’s grammatical and NLP research are both largely corpus based. On the one hand, VISL develops taggers, parsers and computational lexica based on corpus data, on the other hand these tools – once functional – are used for the grammatical annotation of large running text corpora, often with or for external partners (project list 1999-2009. The main methodological approach for automatic corpus annotation is Constraint Grammar (CG), a word based annotation method.
  • TIME Magazine Corpus of American English This website allows you to quickly and easily search more than 100 million words of text of American English from 1923 to the present, as found in TIME magazine. You can see how words, phrases and grammatical constructions have increased or decreased in frequency and see how words have changed meaning over time.
  • Regex Dictionary by Lou Hevly The Regex Dictionary is a searchable online dictionary, based on The American Heritage Dictionary of the English Language, 4th edition, that returns matches based on strings —defined here as a series of characters and metacharacters— rather than on whole words, while optionally grouping results by their part of speech.
  • Michigan Corpus of Academic Spoken English >> ELI Corpora & UM ACL The Michigan Corpus of Academic Spoken English (MICASE) is a collection of nearly 1.8 million words of transcribed speech (almost 200 hours of recordings) from the University of Michigan (U-M) in Ann Arbor, created by researchers and students at the U-M English Language Institute (ELI). MICASE contains data from a wide range of speech events (including lectures, classroom discussions, lab sections, seminars, and advising sessions) and locations across the university.
  • LDC – Linguistic Data Consortium New Corpora at the LDC include: Indian Language Part-of-Speech Tagset: Bengali ~100K words of manually annotated Bengali text Message Understanding Conference 7 Timed (MUC7_T) ~ timed annotation for named entities Asian Elephant Vocalizations ~57.5 hours of audio recordings of vocalizations by Asian Elephants NIST 2005 Open Machine Translation (OpenMT) Evaluation ~ source data, reference translations, and scoring software used in the NIST 2005 OpenMT evaluation TRECVID 2006 Keyframes & Transcripts ~keyframes extracted from English, Chinese, and Arabic broadcast programming
  • Linas’ collection of NLP data Here is a collection of linguistic data, including a collection of parsed texts from Voice of America, Project Gutenberg, the simple English Wikipedia, and a portion of the full English Wikipedia. This data is the result of many CPU-years worth of number-crunching, and is meant to provide pre-digested input for higher order linguistic processing. Two types of data are provided: parsed and tagged texts, and large SQL tables of statistical correlations. The texts were dependency parsed with a combination of RelEx and Link Grammar, and are marked with both dependencies (subject, object, prepositional relations, etc.), with features (part-of-speech tags, verb-tense and noun-number tags, etc., with Link Grammar linkage relations, and with phrasal constituency structure.
  • LexChecker LexChecker is a web-based corpus query tool that shows how English words are used. Users submit a word into the query box (like a Google search) and LexChecker returns a list of the patterns in which the word is typically used. Each pattern listed for a word is linked to sentences from the British National Corpus (BNC) that show the word occurring in that pattern. The patterns are what we have dubbed ‘hybrid n-grams’. These are a uniquely useful form of corpus search result. They can consist of a string of words such as keep a close eye on or gain the upper hand. Or they could contain substitutable slots marked by specific parts of speech, for example run the risk of [v-ing] or stand [noun] in good stead or [verb] a storm of protest (as in raise/spark/cause/create/unleash a storm of protest).
  • Forensic Linguistics Institute (FLI) Corpus of Texts Appeals, Blackmail and Extortion, Confessions, Death Row Final Statements, Declarations of War, Last Wills and Testaments, Miscellaneous, Statements by Police, Suicide Notes
  • David Lee’s Corpus-based Linguistics LINKS “These annotated links (c. 1,000 of them) are meant mainly for linguists and language teachers who work with corpora, not computational linguists/NLP (natural language processing) people, so although the language-engineering-type links here are fairly extensive, they are not exhaustive…”
  • CORPORA List The CORPORA list is open for information and questions about text corpora such as availability, aspects of compiling and using corpora, software, tagging, parsing, bibliography, conferences etc. The list is also open for all types of discussion with a bearing on corpora.
  • Corpora for Language Learning and Teaching
  • [bnc] British National Corpus The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of current British English, both spoken and written.
  • AUE: The alt.usage.english Home Page This is the web site of the alt.usage.english newsgroup. Contains audio archive.
  • Project We Say Tomato
  • Corpus of Historical American English (COHA) COHA allows you to quickly and easily search more than 400 million words of text of American English from 1810 to 2009 (see details on corpus composition). You can see how words, phrases and grammatical constructions have increased or decreased in frequency, how words have changed meaning over time, and how stylistic changes have taken place in the language.
  • Speech Accent Archive The speech accent archive uniformly presents a large set of speech samples from a variety of language backgrounds. Native and non-native speakers of English read the same paragraph and are carefully transcribed. The archive is used by people who wish to compare and analyze the accents of different English speakers.
  • IDEA – The International Dialects Of English Archive IDEA was created in 1997 as a free, online archive of primary source dialect and accent recordings for the performing arts. Its founder and director is Paul Meier, author of the best-selling Accents and Dialects for Stage and Screen, a leading dialect coach for theatre and film, and a specialist in accent reduction.
  • American Rhetoric: The Power of Oratory in the United States Database of and index to 5000+ full text, audio and video versions of public speeches, sermons, legal proceedings, lectures, debates, interviews, other recorded media events, and a declaration or two.
  • The AMI Meeting Corpus The AMI Meeting Corpus is a multi-modal data set consisting of 100 hours of meeting recordings.

No great mind has ever existed without a touch of madness – Aristotle

Don’t

Don’t go with the flow. Critically evaluate the status quo and create a new flow. Be the flow.

Don’t go with the crowd. Critically evaluate people views and establish your own. Be a light, not a shadow. Live with your own standards.

Stay hungry

Save the best for last? NO.

Have the best today and strive to get another one tomorrow. Stay hungry for improvement.

Pencitraan

Pada hakikatnya, semua manusia adalah makhluk pencitraan, yang selalu dalam usaha untuk menampilkan citra diri yang dikehendaki.

Tapi saya tidak memasang foto apapun, saya tidak mau tampak ‘gimana gitu.

Sunyi bukan berarti tak bermakna. Ketidakhadiran representasi gambar bukan berarti ketiadaan upaya pencitraan diri. Ketika diam, maka citra itulah yang dikehendaki. Citra supaya tidak tampak ‘gimana gitu’ yang merupakan oposan dari gimana gitu.

Poin yang ingin saya sampaikan, silahkan pasang foto. #bebas karena setiap orang selalu dalam upaya membangun citra diri, dan itu tidak apa-apa. Jangan pedulikan bu Tedjo.

Stephen King on Writing Block

 

“If you don’t have time to read, you don’t have the time (or the tools) to write. Simple as that.” ― Stephen King

You have to read widely, constantly refining (and redefining) your own work as you do so. It’s hard for me to believe that people who read very little (or not at all in some cases) should presume to write and expect people to like what they have written, but I know it’s true. If I had a nickel for every person who ever told me he/she wanted to become a writer but “didn’t have time to read,” I could buy myself a pretty good steak dinner. Can I be blunt on this subject? If you don’t have the time to read, you don’t have the time (or the tools) to write. Simple as that.
Reading is the creative center of a writer’s life. I take a book with me everywhere I go, and find there are all sorts of opportunities to dip in … Reading at meals is considered rude in polite society, but if you expect to succeed as a writer, rudeness should be the second-to-least of your concerns. The least of all should be polite society and what it expects. If you intend to write as truthfully as you can, your days as a member of polite society are numbered anyway.

Discourse is how a language is used socially to express meanings. Language is never neutral as it connects our personal and social worlds (Henry and Tator 2002).

Reading Block

Because writing is a reflective process in which writers critically reflect concepts they obtain from reading activities and develop arguments based on the acquired/comprehended concepts, reading block can simply be  understood as inadequate reading #NoDrama. Read!

Pedagogical Knowledge

Kita (termasuk saya) sering lupa bahwa pengetahuan dalam bidang tertentu (knowledge of a particular subject) berbeda dari pengetahuan dan ketrampilan pedagogis untuk pengajaran bidang tersebut (pedagogical knowledge and skills of the subject). Sehingga, kita sering terjebak pada analogi ‘pemain sepak bola yang jago adalah pelatih yang baik untuk siswa sekolah sepak bola’ atau analogi ‘mike tyson adalah pelatih tinju terbaik karena dia adalah juara dunia tinju’. Apakah Cus D’Amato seorang boxer kelas dunia?

Pengetahuan dan ketrampilan dalam bidang tertentu memang diperlukan untuk pengajaran bidang tersebut. Tapi, dalam pendidikan, pengetahuan dan keterampilan bidang saja tidak cukup, ada yg lebih penting. Pedagogical knowledge and skills.

Pemenang hadiah nobel Fisika belum tentu bisa mengajar Fisika SMP di sekolah Indonesia. In other words, untuk mengajarkan Fisika di tingkat SMP di Indonesia, kita tidak memerlukan pemenang hadiah nobel Fisika.

Seberapa besar porsi yang diperlukan, antara pengetahuan dan ketrampilan dalam bidang ilmu, dan pengetahuan dan ketrampilan pedagogis untuk pengajaran bidang tersebut, that demands more research. But my assumption is simple, pengetahuan dan ketrampilan pedagogis untuk pengajaran suatu bidang harus berada pada level tertinggi, #GasPol! sedangkan pengetahuan dan ketrampilan pada bidang ilmu tersebut secukupnya saja.

Saya lebih suka berada di kelas bersama dengan guru yang mempunyai kemampuan pedagogis hebat.

Ref. Pedagogical Knowledge 

A message for myself

It’s easier for us to see our inabilities and doubts, which prevent us from trying, doing and achieving. Perfection is shadowing us, making it difficult to accept flaws and failures.

Well, my friends. Focus on abilities and confidence. Try, do and achieve. Perfection is a myth. It’s ok to get 7 out of 10, or even 5. Believe in yourself. Let’s accept our flaws and failures as part of our journey to achievements. Without the pain, our gain would be plain.

Finding yourself is not the right attitude, because this assumes that ‘yourself’ is there, somewhere, and your mission is to find it. What if you never find it?

Create yourself, because it’s always within you.

Don’t play their game

Everybody is playing a game and sometimes they will take you into the field and push you to play it. If you do, there will be two possible results: you win the game or you lose it. Because it’s not your game, the probability that you lose is greater. They set the rules.

Leave it.

Not playing the game doesn’t mean you lose. That means you have your own rules and won’t let others drag you. Leave their game.

Life is like riding a motorcycle.
You will face the wind and rain, but you always have the option to accelerate. Focus on the destination. Just make sure you keep your balance and enjoy the ride. Be grateful for the journey. #life

Barangkali, setiap kita harus menyadari bahwa di belantara ilmu, kita tidak lebih dari remahan rengginang atau rontokan abon malkist. Dengan mengetahui luasnya ilmu, kita bisa menjadi rendah hati.

Kenapa kita tidak boleh membenci orang

Membenci dan berkata buruk tentang makanan saja tidak boleh, apalagi membenci sesama manusia

Sepatutnya biasa saja. Membenci orang lain karena berbeda dari kita itu bentuk pengingkaran kenyataan bahwa kita juga tidak sempurna, dan mungkin ada orang lain yang membenci kita karena ketidaksempurnaan itu. Membenci orang lain karena berbeda juga bisa jadi bentuk kesombongan, karena kita menilai orang lain tidak lebih baik dari kita.

Sepatutnya biasa saja, seperti berhadapan dengan takjil yang beragam. Ambil saja yang baik.

#RefleksiRamadan

The Power of Self-acceptance

Paijo malah ikut tertawa ketika beberapa orang di dalam ruangan itu menertawakan dirinya. Mungkin karena jengkel, orang-orang itu semakin semangat mengolok dan menertawakan Paijo. Tapi Paijo memang aneh. Semakin keras mereka tertawa, semakin keras pula tawa Paijo. Bhahaha…

Mungkin level jengkel mereka sudah memuncak, sampai akhirnya mereka berteriak. ‘Gendeng kowe jo, tak guyu kok kamu ngguyu. Kamu edan ya, kami tertawakan kok kamu tertawa juga’.

Dengan tawa yang dia nikmati, Paijo menjawab ‘Bhahaha… Jangankan sampeyan, saya sendiri lo menertawakan diri saya. Memang bodoh saya ini haha… Saya ikut menikmati kebahagiaan sampeyan menertawakan orang lain’.

‘Pancen edan!’ Umpat salah satu dari mereka sambil bergerak menjauh. Kemudian, semua ikut berjalan menjauh, meninggalkan Paijo sendiri di dalam ruangan.

Paijo mendekati kursi di sisi ruangan. Duduk tenang, dia menikmati kopi yang sudah tidak panas lagi. Dia biasa saja.

Paijo sudah terlatih remuk hati. Bully itu semacam biskuit yang sering dia nikmati di masa lalu. Sudah biasa. Dia tahu, menyangkal dan melawan orang lain yang menertawakan tidak akan membuatnya merasa lebih baik. Sebaliknya, menerima kekurangan dirinya dan ikut tertawa membuatnya lebih kuat.

Pelan, Paijo bergumam ‘it’s the power of self-acceptance bro’

ARDIAN.ID

Hi, I’m Ardian – a passionate learner. I’m a professional Language Consultant (LC) and the Chief Editor of Prosemantic – professional translation and proofreading-editing service provider (visit s.id/prosemantic). I love twisting my thoughts into words.

In this home section, you’ll find short posts reflecting my views on common issues, sort of coffee-tea things and the like. I put links to research and teaching-learning materials in the resources section. Ruang Kelas is a page for my teaching activity. If you want to know more about me and my professional interests, visit the about section.

Cheers

 

Progress over perfection #invictus
Layanan Proofreading & Translation
s.id/prosemantic atau WA 08117887000

Paradok

Seperti anjuran untuk menyelamatkan hutan dan pepohonan yang dicetak di atas berlembar-lembar kertas

Seperti manusia yang berharap surga, percaya bahwa dunia hanya tempat persinggahan sementara, tapi berebut dunia seperti hidup selamanya

Percaya bahwa Tuhan menilai manusia dari kadar taqwanya, kemudian menilai sesama manusia dari merk mobilnya.

Bahwa di depan Tuhan semua manusia setara. Di mata manusia, kita berada pada kelas sosial yang berbeda.

Ironis

Kalah

Sesekali saya biarkan anak-anak kalah, supaya mereka mengerti bahwa terkalahkan itu manusiawi, supaya mereka bisa menerima kenyataan bahwa hidup tidak melulu tentang kemenangan dan keberhasilan.

Alasan lain yang lebih penting. Saya biarkan mereka kalah supaya mereka bisa menjadi lebih peka hati dan penuh kasih, bisa berempati, bisa merasakan apa yang dirasakan oleh orang lain, sehingga mereka tidak jumawa terhadap kemenangan dan pencapaian.

Sesekali biarkan mereka kalah, agar mereka tidak menjadi manusia egois.

Dialogisme dalam hidup

Seorang teman datang dengan penuh emosi. ‘Kurang ajar penulis itu. Dia menulis kalimat-kalimat yang tidak bisa saya terima. Saya tersinggung’.

‘Sik, kenapa bro?’ saya bertanya lewat jalur panggilan yang samar, mungkin jaringan sedang tidak baik.

‘Dia menulis kalimat begitu, maksudnya pasti menyinggung saya. Saya emosi’.

‘Haha.. Ojo gampang ngamuk ta bro. Kita kan tahu, makna tulisan itu dibangun dari persepsi dan pemahaman kita, bukan sekedar dari teks yang ditulis. Artinya, aktivitas membaca itu dialogis, kalau boleh meminjam istilah mas Bakhtin, dialogism. Penulis menuangkan idenya melalui kata dan kalimat yang dipilih, tapi bagaimana pembaca memahami itu tergantung dari keberterimaan pembaca, bagaimana pembaca mencerca tulisan. Ada yang habis sholat, terus membaca, reaksinya biasa saja karena emosinya tenang. Ada yang belum makan, luwe laper, membaca kalimat itu jadi emosi. Seperti kamu.. Haha. Udah makan apa belum sampeyan?’

‘Durung 😁

‘Ayo mangan. Ooo tibake laper arek iki’ *%@ 😁😅

Systematic and Critical Thinking

𝐃𝐮𝐚 𝐡𝐚𝐥 𝐲𝐚𝐧𝐠 𝐰𝐚𝐣𝐢𝐛 𝐝𝐢𝐤𝐮𝐚𝐬𝐚𝐢 𝐦𝐚𝐡𝐚𝐬𝐢𝐬𝐰𝐚 𝐭𝐞𝐫𝐤𝐚𝐢𝐭 𝐝𝐞𝐧𝐠𝐚𝐧 ‘𝐭𝐡𝐢𝐧𝐤𝐢𝐧𝐠’ – 𝐩𝐫𝐨𝐬𝐞𝐬 𝐛𝐞𝐫𝐩𝐢𝐤𝐢𝐫.

(1). Systematic Thinking – banyak mahasiswa di berbagai level masih belum bisa berpikir sistematis. Kemampuan untuk mengorganisir informasi masih belum cukup baik.

(2). Critical Thinking – belum mampu dan belum ‘berani’ berpikir kritis, cenderung menerima konsep/ide dan pandangan tanpa evaluasi kritis yang cukup.

Dua hal ini berkaitan erat dengan bagaimana kita mengorganisir, mengevaluasi/analisa, dan menyajikan informasi (termasuk data).

Jika dua hal ini kurang/tidak baik, biasanya organisasi, evaluasi/analisis, dan penyajian informasi menjadi kurang/tidak baik pula.

Mahasiswa harus sering berlatih systematic dan critical thinking supaya bisa menghasilkan pemikiran yang terorganisir dengan baik, berdasar pada evaluasi/analisis yang baik, dan tersaji dengan baik sehingga mudah dipahami.

#Thinking

Students out there

Sekitar tahun 2008, bersama beberapa teman, saya terlibat dalam proyek Microsoft for Indonesia. Salah satu tujuan proyek tersebut adalah mengukur computer literacy guru-guru dan siswa di beberapa daerah.

Banyak hal menggembirakan. Namun, kita juga menemukan kondisi yang membuat kita sedih.

Pada umumnya, sekolah sudah memiliki komputer (dan lab), tapi banyak siswa dan beberapa guru tidak mempunyai akses menggunakan komputer (akses ya, bukan kepemilikan komputer atau laptop). Saat itu, masih banyak guru tidak memiliki komputer atau laptop personal, sehingga satu komputer digunakan 4 sampai 6 guru.

Bagaimana dengan siswa? Bisa ditebak.

Saat kita ‘terpaksa’ beralih ke mode belajar melalui internet, mereka yang selalu ada di kepala saya. Semoga aksesnya semakin membaik. Semoga kondisi ekonomi mereka sudah membaik, sehingga mereka bisa mengikuti mode belajar yang berlaku saat ini

You’re a doughnut

Imagine this. You’re a sweet, soft and fluffy doughnut. Would you change yourself to salty chicken porridge if someone is trying to irritate you? No, please don’t.

Anytime in the future, when someone is trying to insult you, don’t change yourself. Say this to yourself…

I’m a sweet, soft… fluffy doughnut 🍩

Student well-being

Selain pertanyaan ‘how’s the progress?’ dan ‘what chapter are you working on?’ – pertanyaan lain yang sering diajukan pembimbing riset doktoral saya adalah ‘are you ok?’ – ‘are you feeling ok?’.

Yes, student well-being adalah salah satu aspek penting dari studi. Studi (dan hampir semua layanan kampus) sangat berorientasi pada mahasiswa. Dalam proses studi, pembimbing bertanggung jawab akan dua hal, penyelesaian riset dan student well-being.

Saya akan fokus ke student well-being. Apa dampaknya terhadap mahasiswa? Saya merasa nyaman dan menikmati masa studi (malah kalau boleh studi lagi, saya mau 😁 #ngarep). Saya merasa mendapatkan banyak hal, bukan sekedar ilmu, tetapi pengalaman hidup. Living experience, seperti yang dijanjikan di iklan-iklan kampus luar negeri. Hah? Apalagi itu living experience?

Awalnya saya kita ini cuma jargon marketing, ternyata bukan. Sebagai mahasiswa riset, saya menghabiskan banyak waktu di kampus. Eh malah lebih sering di kampus, numpang kenyamanan 😁… Bisa dibilang, saya hidup di kampus. Living experience beneran. Saya kira ini juga berlaku di kampus-kampus dalam negeri. Buat mahasiswa, kuliah bukan sekedar proses untuk mendapatkan ilmu dan keterampilan. Lebih dari itu, kuliah adalah pengalaman hidup, yang menentukan pilihan hidup selanjutnya, mau kuliah lagi atau kapok ga mau lagi kuliah 😁 #trauma

Kembali ke ‘are you feeling ok?’ – saya sering mendengar cerita sebaliknya dari teman-teman yang menempuh kuliah S2 dan S3 di dalam negeri. Pada umumnya cerita horor, tentang bagaimana pembimbing yang galak, antara ada dan tiada, sampai ndak direken dan dianggap hantu. Mudah-mudahan tidak demikian.

Dua hal penting, student well-being dan kuliah sebagai living experience

Teaching

Teachers should be willing to climb down and simplify things to reach students’ understanding and then lift the students to a higher level.

Teachers should not be selfishly standing on the top of the ivory tower without offering helping hands.

Corpora Resources

Online Corpus

PolyU Language bank Over 36 mil words of multilingual, multi-genre corpora free
RCPCE Profession-specific Corpora A large collection of texts used in different professions in Hong Kong free
A Query to Internet Corpora (Leeds U) Updated general-purpose online corpora with different languages
British National Corpus (1980-1993) A standard English corpus often used as a reference corpus.
British Academic Written corpus (BAWE) A 6- mil- word collection of student essays in different disciplines
Business Letter Corpus A corpus with different English letters
BYU Corpora A collection of mega-corpora, including such as BNC and NOW (New words from 2010 to yesterday)
The Corpus of Contemporary American English
(COCA ,1990-present)
Representative of modern American English
Time Magazine (1923-2006) A corpus for diachronic language study free
GloWbE (Global Web-Based English) 1.9 billion words of English used in 20 countries free
MICASE Transcripts of a wide range of spoken academic texts from Michigan University. free
The Oxford Text Archive The  Archive develops, collects, catalogues and preserves a variety of electronic literary and linguistic resources free
WebCorp Allows corpus-type searches of documents in English on the Internet. free
CQP Web for Language Corpora A collection of corpora created by the Language and Mutilmodal Analysis Lab(LAMAL), Department of English, The Hong Kong Polytechnic University free
Fashion Communication Corpus (FCC) A 1 million-word texts obtained from fashion magazines, literature, journals, websites etc. free
Enron email corpus Enron email data sets compiled at UK Berkeley free
Corpora maintained by Geoffrey Sampson A collection of different texts

 

Parallel Corpus

Bilingual Parallel Corpora of Chinese Classics Parallel texts of Chinese classic novels and government documents
English-Chinese parallel concordancer A collection of novels, fables and essays free

 

Text Archive

The Gutenberg Project The pioneering project designed to make non-copyright text available electronically free
Internet Archive The Internet Archive Text Archive contains a wide range of fiction, popular books, children’s books, historical texts and academic books. free
Internet Archive: Wayback Machine The Wayback Machine is a digital archive of the World Wide Web and other information on the Internet. You can check the Wayback Machine for archives of a website. free

 

Word Cloud

Voyant Tools To create word cloud based on frequency free
Wordle Wordle is a tool for generating “word clouds” from text that you provide. free

 

Corpus Tools

AntConc A freeware concordance program for Windows. Please visit Laurence Anthony’s Website for the complete list of software. free
AntCorGen A freeware discipline-specific corpus creation tool. free
ConcGram 1.0 ConcGram 1.0 is a corpus linguistics software package which is specifically designed to find all the co-occurrences of words in a text or corpus irrespective of variation.
ConcGramCore ConcGramCore is an open source corpus linguistics software package for corpus linguists to find all the co-occurrences of words in a text or corpus irrespective of variation. The software is in continous development. free
ParaConc A bilingual or multilingual concordancer that can be used in contrastive analyses and translation studies free trial
WordSmith Tools Concordancing, word lists, key words
Leximancer Lexical analysis free trial
WMatrix In addition to frequency lists and concordances, WMatrix extends the keywords method to key grammatical categories and key semantic domains. free trial
Sketch Engine Sketch Engine can provide a one-page summary of the word’s grammatical and collocational behavior, showing the word’s collocates categorised by grammatical relations.
ATLAS.ti (7) For qualitative data analysis and discourse analysis free trial
NVivo (10) For qualitative data analysis and discourse analysis
kfNgram kfNgram makes n-gram indices of any text(s) you give it, similar to WordSmithTools’ Cluster function. free
The IMS Open Corpus Workbench free

 

Lexical Analysers

The Ultimate Research Assistant Lexical semantic thematic analysis of web documents free

 

Taggers

CLAWS Word class (part-of-speech) tagger free
Stanford Log-linear Part Of Speech tagger Different software for POS tagging free
Stanford CoreNLP online engine Online interface of the Stanford CoreNLP software.
Click here for more information of the package.
free
GUM The Georgetown University Multilayer Corpus free

 

Phonetic Analysis

Praat Praat (the Dutch word for “talk”) is a free scientific software program for the analysis of speech in phonetics. free
EMU (The Emu Speech Database System) EMU is a collection of software tools for the creation, manipulation and analysis of speech databases. free
WaveSurfer WaveSurfer is an Open Source tool for sound visualization and manipulation. free
SpeechAnalyzer Speech Analyzer is a computer program for acoustic analysis of speech sounds. free

 

Development Workbench

KPML Workbench for developing grammatical descriptions and defining computational grammars free
TermBase Database for developing and storing terminologies free

 

Descriptive Resources

WordNet A lexical database organizing nouns, verbs, adjectives and adverbs into synonym sets, each representing one underlying lexical concept. free
FrameNet A lexical database containing around 1,200 semanticframes, 13,000lexical units and over 190,000 example sentences. free

 

Statistical Tools

SPSS A famous advanced statistical and analytic tools.
R Project A free package for statistical computing and graphics free
GNU PSPP A free program for statisical analysis. It is a free as in freedom replacement for the proprietary program SPSS, and appears very similar to it with a few exceptions. free
Sample Size Calculator An online calculator to find out the sample size based on the set confidence level and confidence interval. Useful for quantitative research sampling. free

Success

Define success on your own terms, achieve it by your own rules, and build a life you’re proud to live. (Anne Sweeney)

The Effective Teacher

Impressing others by complicating simple things is easy, but simplifying complicated things to make others understand is more challenging. It requires a higher level of comprehension, empathy, and the ability to deliver meanings/messages clearly.

[responsivevoice_button voice=”UK English Male” buttontext=”Play”]

 

 

Writing a Scientific Manuscript

Scientific manuscript writing requires writers (researchers) to use various genres (cognitive genres – Bruce, 2008). In this post, I arrange the sample practices provided by UEfAP based on the basic structure of a scientific article. The arrangement will help you understand how particular genres are used in article sections, and allow you to compare genres employed in different sections (by doing the practices). You can start with the introduction [1], and then finish the other sections.

[1] Writing an Introduction

[2] Writing the Method Section

[3] Writing the Results Section

[4] Writing the Discussion 

[5] Writing the Conclusions

[6] Writing an Abstract

[responsivevoice_button voice=”UK English Male” buttontext=”Play”]

Why scientific manuscripts are rejected?

Summarizing various reasons for rejection of scientific manuscripts, Lucey (2015) proposed 12 dominant factors:

 

  1. Clarity: the paper needs to tell us what it is doing. If there are a host of good ideas all crowding each other out then in the confines of the space available in a modern journal article this is going to present a problem. Without getting into salami slicing, where a host of papers are created from one base, each differing only minutely form each other, the rubric of “One for One”, one major idea per paper, is one to live by. That way you can present a tightly argued, clear, organised paper. The other ideas go in other papers. Ask yourself – is this tightly coherent.
  1. Fit: It still astonishes me how often I see papers that are simply not within the aims and scope of the journal. How hard can it be to check have similar papers been published in the last few years, to read the journal homepage, to perhaps even email the editor or an associate editor? Again, this is not to say that journals shouldn’t, and perhaps even have a responsibility to, go outside the box a little, but sending a theory paper to an empirical journal, or a paper on international trade to one focusing on corporate finance suggests sloppy preparation and a lack of clarity. Check if it fits.
  1. Contribution: I mentioned salami slicing. In empirical papers this most often appears where one or two variables or approaches are changed and a new paper produced. Thus one paper uses one methodology and another a similar, with essentially the same explanatory variable set. Really this is down to referees and editors, where the dreaded “robustness checks” are required to be shown. Let the reader feel they learned something
  1. Triviality: Some things, if not known (can anything be known, really, in social science?) are well accepted. A paper that demonstrates already well attributed findings but in another setting, that is hard to publish. In my area this usually manifests itself as a paper that takes a concept or finding from developed or increasingly emerging markets, applies it to a frontier market and finds the same findings. Salami slicing works this way also, or rather, not. Give the reader a solid reason for reading the paper
  1. Coherence: Some papers are a mess. There is a good reason for, and again this is in my area, a conventional layout: introduction, previous literature, data and methodology, findings, robustness checks, conclusion and recommendations. It works to aid the writer and more important the reader in understanding the flow of the paper. Too many or too few sections, lack of integration across same, a sense where there are multiple authors of multiple voices rather than one, all these make it hard to read and hard to understand. Remember, this is a discourse, a communication. Make the paper clear
  1. Completeness: some papers are simply not complete. For most publishers now there is a technical screening before it hits the editor; are the manuscripts, tables, figures, data etc. in the submission? Has it passed the plagiarism screening? Is it legible? Sometimes people simply forget to include material. It is uncommon but not unknown to have papers that have <to be added – Jim> or something similar. If it’s not complete, it’s not going anywhere. Complete the paper.
  1. Legibility: At times I feel like channelling Samuel L. Jackson, discussing linguistics with Brett, in Pulp Fiction. English is overwhelmingly the language of academic publishing. If the language is poorly structured, riddled with errors syntactical and lexical, then it’s going to be rejected. Get it proofread, even if you are a native English speaker.
  1. Correctness. A paper needs, especially if it is going to challenge established wisdom, to be very well constructed and to leave the reader feeling that yes there is a solid challenge. If the paper misses a whole pile of literature, has bad statistics, overambitious conclusions drawn from fuzzy data, is in general riddled with poor science, then it’s going to go down. Alternative perspectives are great, but being wrong is an alternative to being right. Check your science.
  1. Strength: This is often the case when papers are from junior researchers or are driving forward a new area. At the end we want to know – so now what do we do or where do we go? If the paper cant tell us that, perhaps because of some of the other issues noted here or because the paper spent too long or in too rambling a way to get to the point, then it is not going to prosper. Make it strong but grounded.
  1. Replicability: Data integrity and replicability are becoming key concerns of journal editors. Some have adopted a policy of having data and commands deposited with the paper. In general however the paper should be complete in its descriptions so that someone with the same or similar data can reproduce the flow. Explain what data, where sourced, what cleaning etc.; outline the nature of the theoretical steps; explain the experiments. Many of these explanations, which can be quite long, can be placed now as supplemental appendices, and should be so done. That way the paper as such can be short and pointed and the interested replicator can go to the appendices for detail. If there is a sense that this can’t be replicated, then it’s incomplete and poorly written and will crash. Make it reproducible
  1. Courtesy: The academy is quite small once you get into paper writing and reviewing. I have had occasion to reject a paper from a journal knowing, as I had been the reviewer just two weeks before, that the paper authors had not made any effort to address my previous concerns. That doesn’t mean agreeing with them, it does mean addressing them. Sending a literally identical paper sequentially rejected to a multiple of journals will get you a bad reputation and you WILL meet as editors or other gatekeepers people whose views you have blown off. Address the concerns.
  1. Bad Luck: ideas and topics go into and out of vogue. It is not uncommon to see two or more similar papers addressing similar areas being submitted. In that case there is an element of luck. Generally I will try to track back, via working papers dates, and see who has some claim on priority. This, by the way, is another reason why working papers and conference presentations are useful; they show intellectual priority. At any rate, Solomonic judgements are sometimes required. Be swift, but sure, I suggest.

 

Ref. https://brianmlucey.wordpress.com/2015/09/09/a-dozen-ways-to-get-your-academic-paper-rejected/

Rhetorical Functions in Academic Writing

Typical rhetorical functions used in academic writing

Descriptive

  1. Describing objects, location, structure and direction
  2. Reporting and narrating
  3. Defining
  4. Writing instructions
  5. Describing function
  6. Describing processes, developments and operations
  7. Classifying / categorising
  8. Giving examples
  9. Including tables and charts

Critical

  1. Writing critically
  2. Arguing and discussing
  3. Evaluating other points of view
  4. Comparing and contrasting: similarities and differences
  5. Generalising
  6. Expressing degrees of certainty
  7. Expressing reasons and explanations / cause and effect
  8. Expressing feelings
  9. Analysing
  10. Planning action
  11. Providing support
  12. Application
  13. Working with different voices and finding your own
  14. Taking a stance
  15. Using theory
  16. Persuading
  17. Introducing
  18. Using previous research
  19. Indicating a gap
  20. Presenting findings from statistical analyses
  21. Presenting findings from interviews
  22. Discussing limitations
  23. Drawing conclusions
  24. Recommendations
  25. Implications

Reflective

  1. Writing reflectively

Source: http://www.uefap.com/writing/function/funcfram.htm

Analyze My Writing

 

One of challenging (and time-consuming) tasks writing teachers face in their teaching is assessing learners’ written works. It takes a lot of energy to read and analyze students’ writing.

This online text context and readability analyzer will help teachers assess learners’ written works. Some useful features are available on the website, such as basic text statistics, common words and phrases, readability, lexical density (I like this one most!), word and sentence lengths, and other analyses.

Visit the website: http://www.analyzemywriting.com/index.html

 

 

 

How to read a scientific article quickly and efficiently

Most people read scientific articles by following the structure of the articles, starting from the first section until the last one. This is the most common method, but I found it inefficient.
.
Here is my method.
.
Skim the abstract quickly
Go to the conclusions/summary section and identify the author’s main conclusions/arguments
Flip through the article to look at the research data supporting the conclusions; identify the most prominent data.
Now, you can start reading the introduction.

Berbicara dan Menulis

Saya sudah banyak berlatih berbicara dan menulis, tapi hasilnya belum baik. Kenapa?

Karena ucapan dan tulisan adalah produk olah pikir, maka yang harus ditingkatkan atau diperbaiki adalah pola pikir (kemampuan berpikir sistematis), bukan sekedar berlatih berbicara atau menulis.

Pembicara dan penulis yang baik mampu berpikir secara sistematis dan menata ide dengan baik sehingga ucapan dan tulisannya bisa dipahami audiensi*.

Kita bisa membayangkan ide/gagasan seperti mainan Lego. Berwarna-warni, sangat menarik, tapi tidak terlihat bentuknya jika tidak disusun. Jika ide/gagasan disusun dengan baik, akan sangat mudah bagi audiensi untuk menerima bentuk argumen yang kita sajikan.

Darimana kita belajar menyusun ide yang baik, sehingga ide/gagasan kita menjadi bentuk-bentuk yang menarik?

Banyak mendengar, banyak membaca (materi yang baik). Dua hal itu akan memberikan ‘template’ kognitif sehingga ucapan dan tulisan kita menjadi semakin baik.

* kata yang tepat adalah ‘audiensi’ bukan ‘audiens’.

Writing the Methodology

Read this first – Writing the Introduction

How should you write the section? What are important points you should include in the section Ok. Let’s use the Q&A method. Here are key questions you need to answer for completing the methodology part of your scientific article.

𝗙𝗶𝗿𝘀𝘁, 𝗿𝗲𝘀𝘁𝗮𝘁𝗲 𝘁𝗵𝗲 𝗿𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗼𝗯𝗷𝗲𝗰𝘁𝗶𝘃𝗲 [𝗶𝗳 𝗿𝗲𝗾𝘂𝗶𝗿𝗲𝗱] ⬇️

What is the research method? [qualitative/quantitative/mixed methods]

Define/explain the research method.

Why do you use the research method?

Discuss the advantages of the method and how it can help you collect research data to answer the research question.

What is the research design? Define/explain the research design.

Why do you use the research design? Justify it.

What are the advantages of using this design? Are there any similar/previous studies which used the design?

What are the research instruments you employ to collect the research data? Define and explain.

Why did you use such instruments? Justify.

What are the advantages and the weaknesses? (In some cases, you need to explain how you address the weaknesses of the research instruments)

What is the type of the data collected? Primary or secondary data?

Who/what are the sources of the research data? Define/explain. Present the details.

How did you collect the data? (setting, time/period, etc.)

Overall, how was the data collection? Any problems that can affect the data quality?

Discuss the quality of the collected data.

Write a sentence stating that the results of data collection will be elaborated in the next section [this is one of effective strategies you can use to maintain the cohesiveness and coherence of your manuscript]

Arrange the sentences [the answers to the above questions] into paragraphs. Again, pay attention to cohesiveness and coherence. Use linking words and phrases.

Add some details [if required] and polish.

Writing an Introduction

Writing an introduction is not easy. Writers (researchers) often face the challenging question of how to write the introduction part of a scientific article. In my workshop, I use a Q&A method to help writers (researchers) identify key points which should be included in the introduction. If you can answer most of these questions, then you are ready to write the introduction part of your scientific article. What you need to do next is to arrange the answers to these questions into paragraphs. Pay attention to the coherence and cohesion. So, here are the questions. Ready?
.
𝗤𝘂𝗲𝘀𝘁𝗶𝗼𝗻𝘀 𝘁𝗼 𝗮𝗱𝗱𝗿𝗲𝘀𝘀 𝗶𝗻 𝗜𝗻𝘁𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻 ⬇️
.
What is the general topic of the manuscript?
Why is the topic important?
Are there any data or research findings related to the topic?
What do the data or research findings say about the topic?
Are there any grand theories related to the topic?
What do the theories propose?
Are there any practical aspects/implications or regulations related to the topic?
How does the topic affect practical aspects? Are there any practical implications?
How are the regulations related to the topic?
.
𝗟𝗲𝘁’𝘀 𝗯𝗲 𝗺𝗼𝗿𝗲 𝘀𝗽𝗲𝗰𝗶𝗳𝗶𝗰 ⬇️
.
What is the main research question? Or- what is the specific issue the article addresses?
What is the main objective of the article?
Why is it important to address the main research question?
Why is it important to achieve the main objective?
What do available data or research findings say about the specific issue?
Are there any grand theories related to the specific issue?
What do the theories say about the specific topic?
.
Great! Now you are ready to write the introduction part of your article.

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑