| Kieran Maynard's Website
back to homepage

Toward a Japanese “linguistics of speech”: collocation in the BCCWJ

by Kieran Maynard (2010)

Advisor: Dr. William Kretzschmar

The University of Georgia LING 4080/6080

Keywords: Computational linguistics, Japanese, NLP, corpus linguistics, electronic corpora, semantic analysis

Corpus linguistics will soon benefit from the public release of a new electronic corpus of written Japanese. This study attempts to lay the foundation for an application of the Kretzschmar's (2009) “linguistics of speech” to Japanese speech data in the aggregate. We will first characterize the Kotonoha Project, describe our theoretical foundations and some issues specific to Japanese, then apply Sinclair's (2004) analytical methods to Japanese corpus evidence in search of significant collocations and the distributional pattern of the linguistics of speech.

The Kotonoha Project

The advent of computer processing and storage has made the compilation of massive language corpora possible, and the Internet has made them accessible to researchers. Computers enable analysis of large bodies of text, and these analyses have produced startling findings. While much work with corpora has been done with English, Japanese corpora have also been compiled and made available on the Web.

The National Institute for Japanese Language and Linguistics (NIJL) in Tokyo is compiling a “Balanced Corpus of Contemporary Written Japanese” (BCCWJ) slated to be opened to the public in 2011. According to Maekawa's (2007) estimate, the corpus will comprise approximately 100 million words “selected randomly from well-defined statistical populations covering [a] wide range of written texts” (Maekawa 2008). The NIJL defines a “balanced corpus” as one that “as accurately as possible represents contemporary Japanese” (gendai nihongo no jissi no dekiru dake seikaku na syukuzu to naru (“Kokuritsu”)). Previous studies of written Japanese have dealt with material either too old (e.g., copyright-expired literature), not sampled randomly, or skewed in its distribution. Newspaper writing, for instance, is produced for institutions that seek to minimize variation, and Internet writing lacks categorization and copyrighted works.

The BCCWJ will conflate three sub-corpora: the (1) “publication” or “production” 34.7 million-word sub-corpus that “consists of samples extracted randomly from the statistical population covering the whole body of books, magazines, and newspapers published [in Japan] during 2001-2005” (Maekawa 2008); (2) the “library” or “circulation” 30 million word sub-corpus sampled from “the whole [of] books registered in at least 13 public libraries in the Tokyo Metropolis” (ibid.); and the (3) “special-purpose” or “out-of-population” 35 million word sub-corpus comprising “various special purpose mini corpora” of about 5 million words each, “includ[ing] texts of governmental white papers, Internet text…, minutes of the [N]ational [D]iet, school textbooks, and best-selling books of the past 30 years” (ibid.).

My analysis uses the online demonstration version of the BCCWJ, which comprises “[a]s of September 2007… the full-text query of the 10 million words [of] texts that are copyright cleared [that] are publicly available on the web” (ibid.). The demonstration BCCWJ gives concordance lines for any search term, to a maximum of 500. Additional parameters may be entered in regular language to refine the search on either side of the node. The classifications given to concordance lines are author, author's decade of birth, author's gender, genre, book title/source, subtitle/classification, volume number, compiler, etc., publisher, and misc. notes. Though it is possible to copy the concordance lines into another program, statistical analysis cannot be carried out on the demonstration BCCWJ page; the eventual public release of the corpus will enable more rigorous statistical treatment of the data in this paper.

Words and linguistic features

What are the units of meaning in Japanese? The “word… does not reign unchallenged as the basic unit of language” (Sinclair 2004: 25)—other concepts, like the morpheme, have gained currency, yet the morpheme is often too small a unit for the study of linguistic variation in text. In Japanese text, where word boundaries are not differentiated by orthography, what counts as a word?

John Sinclair (2004) has proposed a model for compound lexical items in the structural analysis of English. In the chapter “The Search for Units of Meaning,” he makes the case for compound lexical items with four major structural categories: collocation, colligation, semantic preference, and semantic prosody. He posits a continuum in the lexis between the “open-choice principle” and the “idiom principle,” exemplified by the “terminological tendency” of words “to have a fixed meaning in reference to the world” (29) and the “phraseological tendency” of words to “go together and make meaning by their combinations” (29), respectively. He hypothesizes that (29-30):

…the notion of a linguistic item can be extended, at least for English, so that units of meaning are expected to be largely phrasal. Some words would still be chosen according to the open choice principle, but probably not very many, depending on the kind of discourse. The idea of a word carrying meaning on its own would be relegated to the margins of linguistic interest, in the enumeration of flora and fauna for example.

Sinclair has used the mid-1995 Bank of English 211 million word corpus to gather collocations for phrases and words to show that words often appear next to other words (collocate) and in certain grammatical patterns (colligate), and that an analyst can further abstract a “semantic preference” and “semantic prosody” (2004: 32-3). Even a common word like place (as it might appear in a sentence like “…She came over to my place with a friend…” (38)) Sinclair describes as “a compound lexical item which has a semantic prosody 'informal invitation,' a semantic preference for 'local travel' which is realized by colligation with a verb of movement and optionally a directional adverb, with come and over as typical collocations” (38).

Distributions of linguistic features and The Linguistics of Speech

William A. Kretzschmar, Jr. in The Linguistics of Speech (2009) goes beyond Sinclair's “phraseology” and defines “linguistic features” based on the work of Ferdinand de Saussure as “anything we can identify as an entity or unit having to do with what people say” (53). A linguistic feature is often larger or smaller than a word, which is in itself a tricky concept to define, but one that continues to influence our perceptions of language (54):

Linguistic features of speech, concrete entities, thus are commonly taken to be different words used for the same referent (synonyms), or alternative morphs or phones used as components of what we identify as the same word, or alternative arrangements of words in what we recognize to be sequences with equivalent meaning or organization. For Saussure, “identity” comes from such acts of recognition, as when we consider the word “messieurs” to be the same word even given variations in “delivery and intonation” by different speakers…

John Firth and others have shown that words derive their meaning from context, not the other way around. Where and how often they occur in texts, their distribution, is then very important. Michael Stubb's (2001) study of English described in Kretzschmar (2009) used corpora to calculate the rate of co-occurrence for word forms (“node” words) their collocates, and found that 90% of node words appeared near their top collocate at least 2% of the time, which is still 250 times the probability of co-occurrence by chance. Thus Kretzschmar (2009) posits that “words are not deployed randomly in speech, or evenly spaced, but instead they normally occur in clusters… in proximate association with other words as collocates” (154), therefore “it is… a normal feature of language in use that any given word, when considered as a node word, is likely to have multiple collocates with unexpectedly high rates of co-occurrence” (154-5).

Kretzschmar (2009) has proposed a new model for the study of language in use, called the “linguistics of speech” as a counterpart to the academic North American “linguistics of linguistic structure” (4), and shown that the distribution of linguistic features in speech and writing is non-linear. When organized into types and tokens and plotted by frequency, linguistic features always display an asymptotic hyperbolic curve, or “A-curve” (197). This distribution has been described before as the “80/20 Rule,” which predicts 20% of all types will account for 80% of all tokens. Vowel realizations, words, collocations, etc. all follow this distributional pattern, as Kretzschmar (2010) explains (20-1):

…the 80/20 Rule for such non-linear distributions (whether the actual percentages are 90/10 or 70/30) tells us that we will always find one or a few constructions that account for the great majority of the instances for the feature under study, and that there will be a large number of variant constructions for the feature that account for a small minority of the instances…

Systems characterized by such a distribution have been observed in other sciences, and now linguistics; they are known as “complex adaptive systems.” Kretzschmar explains (2010: 5):

This more-or-less 80/20 relationship is no mere curiosity but a sign that speech, language as we use it, behaves as a complex system. Complexity in this specialized sense (not just with the usual meaning 'complicated') is a property of many natural phenomena characterized in mathematical descriptions by "curves without tangents," continuous nondifferentiable functions--in other words, this sort of complexity is characterized by A-curves.

The model of the “linguistics of speech” is built on this understanding of the nature of “emergent order” in language, and offers a new way to study speech data in the aggregate.

The orthography problem

The Japanese writing system poses a bit of a problem in the linguistic analysis of Japanese. Written Japanese regularly appears in a combination of four scripts: kanji, hiragana, katakana, and rōmaji. Hiragana and katakana are basically phonemic systems of characters that represent syllables. Rōmaji are Roman characters used primarily in two competing systems (Hepburn and Kunrei) to record Japanese phonemically. Kanji, or Chinese characters, are less straightforward in their phonetic representation.

First introduced to the West by Jesuits and other missionaries, the characters were imagined to represent concepts divorced from the sounds of spoken language, a mistaken perception that persists to this day. Kanji are usually used to represent the vast heritage of Chinese loan words in Japanese, but also many native words (including 19th- and 20th-Century coinages using Sino-Japanese morphemes), some Buddhist terms (form Sanskrit, Pali, etc.), nativized Western loans (from Portuguese, Dutch, etc.) and even relatively recent loans like pēzi 'page'. Sometimes different kanji are used to differentiate homonyms, analogous with English <sail> and <sale>, while at other times different morphemes or words considered related by sense, paradigm or tradition will be written with the same kanji. For example, the native word mato 'target' is written with the same kanji as the Sino-Japanese morpheme teki- and, likewise the native naka 'inside' with the same kanji as the Sino-Japanese –tū in tekitū 'strike home'. Though their phonetic value is often opaque, kanji are certainly not ideograms, as their primary function in most Japanese text is the same as an alphabetic or any other writing system for human language: to record, even at a rough approximation, the sounds of spoken language. At other times described as morphemic or logographic, DeFrancis (1984) has proposed the term “morphosyllabic.”

While the Japanese government sets standards for the use of kanji, variation in use and their morphosyllabic nature may create ambiguity in which word kanji are intended to record. For this reason a potentially ambiguous search term consulted in the corpus in kanji must be checked against its context to confirm its “reading”: the spoken word, morpheme or syllable it represents. Some terms need to be searched in multiple orthographic forms (possible through the use of Boolean operators in the demo BCCWJ), as words may be written interchangeably in different systems according to tradition, visual appeal, etc.

The NIJL's sampling method is actually based on characters rather than words. Explaining the word-estimation formulas for the Corpus of Spontaneous Japanese (CSJ) produced by the NIJL, Maekawa (2007) warns, “…word boundary in Japanese is heavily theory dependent and hence is not reflected in ordinary orthography.” It may be argued, though, that the orthography itself exacerbates or even causes this problem, as word boundary in any language, as demonstrated by Sinclair (2004) and Kretzschmar (2009), is “heavily theory dependent.”

Naked eyes: ragan and nikugan

The exemplary expression “naked eye” Sinclair (2004) analyzes thus (34):

The speaker/writer selects a prosody of difficulty applied to a semantic preference of visibility. The semantic preference controls the collocational and colligations patterns, and is divided into verbs, typically see, and adjectives, typically visible. With see, etc., there is a strong colligations with modals – particularly can, could in the expression of difficulty – and with the preposition with to link with the final segment. With visible, etc., the pattern of collocation is principally with degree adverbs, and the negative morpheme in-; the following preposition is to. The final component of the item is the core, the almost invariable phrase the naked eye.

As he points out, the phrase is semantically opaque; “unclothed organ of sight” (2004: 31) is not enough to deduce the meaning, and “naked in the collocation naked eye could equally well mean… 'without spectacles, contact lenses, etc.'” (31). In fact Japanese has just such a word ragan, a Sino-Japanese compound comprising the morphemes /ra/ 'naked' (e.g. ratai 'a nude') and /gan/ 'eye' (e.g. gankyū 'eyeball') defined as “the eye without the aid of corrective lenses” (Kondō & Takano 2001). The BCCWJ contains 20 instances of ragan, which collocates with siryoku 'eyesight' 11 times (4 times at N+1, in the compound ragansiryoku 'uncorrected vision'; see appendix 1). As ragan collocates 9 times with the instrumental particle de at N+1, as in ragan-de mi-te (ragan-Instr see-gerund), we can posit colligation with de. The semantic preference we may describe as “visibility”; visual activities like eiga 'film,' pasokon 'computer', and yomi 'reading' appear to the left of the node. The semantic prosody, however, differs from English naked eye. Numbers used in the prescription of glasses appear in 11 lines, and 4 lines contain warui 'bad' as a comment on me 'eye': ragan suggests a semantic prosody of ophthalmology.

The term naked eye is more akin to the Japanese nikugan. The morpheme /niku/ 'flesh' (e.g. nikutai '(the physical) body') is combined with /gan/ 'eye' in /nikugan/, defined as “the eye possessed of the human body; natural eyesight without the use of a telescope, microscope, etc.” (Daijisen 1998). The BCCWJ contains 86 instances of nikugan, which collocates with miru 'to see' (in its various forms) 69 times (see appendix 2). At N+1 the instrumental particle de appears in 70% of the examples (two more simply separate nikugan and de with other instruments in a list). Collocation with the suffix teki makes nikugan an adjective in 14 cases (in the sense 'macroscopic'); in 71% these cases (8.6% of total) nikugan appears in the compound nikugantekiketunyō 'macroscopic hematuria.' An 80% collocation with miru 'see' (conflating inflected forms) and an additional 14% with kansatu 'observation' once again suggest a semantic preference for “visibility.” Of the collocations with miru, 72% involve possibility, divided 26:24 between expressions of the possibility and impossibility of seeing something with the naked eye; it seems nikugan carries a similar semantic prosody to naked eye, described by Sinclair (2004) as “difficulty” (33), which “may just be hinted at by a modal verb such as can or could or more directly by a negative with 'visibility'” (43). For example:

nikugan-de mi-e-ru wakusei

naked eye-Inst see-Pot-nonpast planet

'planet visible to the naked eye'

Failure to see something with the naked eye accounts for 29% of all cases, as in:

nikugan-de-wa mi-ru-koto-ga deki-nai mikuro-no-sekai

naked eye-Instr-Top see-that-Nom can-nonpast neg. micro-Gen-world

'microscopic world that cannot be seen by the naked eye'

The use of nikugan in the affirmative seems to imply that the visibility is unusual, as in:

Tusima-kara taigan-no-kankoku-o nikugan-de nozomu-koto-ga dekiru

Tsushima-from opposite shore-Gen-Korea-Acc naked eye-Insrt-thing-Nom possible

'from Tsushima Island the Korean shore can be seen with the naked eye'

Phraseology: itai me ni au

The next phrase we will examine, itai me ni au 'run into trouble', exhibits a strong phraseological tendency. Daijiten (1998) lists the entire phrase, defined as: “experience pain or suffering; have an awful experience” along with synonymous phrases hidoi me ni au and itai me o miru. The phrase me ni au 'have an experience' appears 600 times in the BCCWJ, and approximately 8% of these are itai me ni au. The phrase itai me occurs 65 times (see appendix 3). Of these, 72% are in some permutation of the phrase itai me ni au:

yudan-si-te-i-ru-to, ita-i me-ni a-u-kara-ne! (24)

neglect-do-gerund-is-nonpast-Cond, painful-nonpast experience-to meet-nonpast

''cause if you don't pay attention you'll run into trouble!'

A further 26% are in itai me o miru:

te-o-da-su-to ita-i me-o-mi-ru (61)

hand-Acc-send out-nonpast-Cond painful-nonpast experience-Acc see-nonpast

'if you make a move you'll be in trouble'

In colligation itai me ni au shows very little variation in particle use: 46 cases of itai me ni and 1 case of itai me ni mo. The 17 cases of itai me o miru shows variation common with the direct object o particle in general, in 5 cases of elision (61: itai me miru) and 1 case of topicalization (62: itai me wa mite mo sikata nai).


Having looked at three linguistic features, we should notice a recurrent distributional pattern: 70% collocation with de and 80% collocation of miru with nikugan, 72% possibility expressions versus others within the miru collocates, and 72% collocation of itai me with ni au within all itai me instances. These distributions exemplify the 80/20 Rule; in other words, the A-curve distribution is “robust” enough to appear even in our rudimentary statistical analysis (see appendix 4). No matter what linguistic feature is chosen, be it a particle (morpheme), word or phrase, the BCCWJ data exhibits significant clustering as expected in the linguistics of speech. Linguists should feel confident they will make new discoveries when the BCCWJ data is released in full.

Works cited

  1. Aronoff, Mark, and Kirsten Anne. Fudeman. "Words and Lexemes." What Is Morphology? Malden, MA: Blackwell Pub., 2005. Print.
  2. Barfield, Andrew, and Henrik Gyllstad. Researching Collocations in Another Language: Multiple Interpretations. New York: Palgrave Macmillan, 2009. Print.
  3. DeFrancis, John. "The Ideographic Myth." The Chinese Language: Fact and Fantasy. Honolulu: University of Hawaii, 1984. Web. 2 Dec. 2010.
  4. Digital Daijisen. Tokyo: Shogakukan, 1998. Web. 2 Dec. 2010.
  5. Hasegawa, Yoko. "The Tense-aspect Controversy Revisted: The -ta and -ru Forms in Japanese." Pragmatics in 1998: Selected Papers from the 6th International Pragmatics Conference. Ed. Jef Verschueren. Vol. 2. Antwerpen: International Pragmatics Association, 1999. 225-40. Web. 3 Dec. 2010.
  6. "Kokuritsu Gengo Kenkyūsho No Gengo Kōpasu Seibi Keikaku Kotonoha." National Institute for Japanese Language and Linguistics. Web. 03 Dec. 2010. <http://www.ninjal.ac.jp/kotonoha/>.
  7. Kondō, Ineko, and Fumi Takano. Puroguresshibu Waei Chūjiten ["Progressive" Japanese-English Dictionary]. 3rd ed. Shogakukan, 2001. Kotobank. Web. 5 Dec. 2010.
  8. "Kotonoha Gendai Nihongo Kakikotoba Kinkō Kōpasu Kensaku Demonsutorēshon." National Institute for Japanese Language and Linguistics. Web. 11 Nov. 2010. <http://www.kotonoha.gr.jp/demo/>.
  9. Kretzschmar, William A., Jr. "The 80/20 Rule in English Grammar." Proc. of NAES-FINSSE 2010, Oulu. Web. 5 Dec. 2010.
  10. Kretzschmar, William A., Jr. The Linguistics of Speech. Cambridge: Cambridge UP, 2009. Print.
  11. Maekawa, Kikuo. "Balanced Corpus of Contemporary Written Japanese." Proc. of The 6th Workshop on Asian Language Resources, 2008, Hyderabad, India. Web. 1 Dec. 2010.
  12. Maekawa, Kikuo. "KOTONOHA and BCCWJ: Development of a Balanced Corpus of Contemporary Written Japanese." Corpora and Language Research: Proceedings of the First International Conference on Korean Language, Literature, and Culture. Seoul, 2007. Web. 4 Dec. 2010.
  13. Maekawa, Kikuo. "Quantitative Analysis of Word-form Variation Using a Spontaneous Speech Corpus." Proc. of Corpus Linguistics 2005, Birmingham. Web. 4 Dec. 2010.
  14. Sano, Motoki, and Takehiko Maruyama. "Lexical Density in Japanese Texts: Classifying Text Samples in the Balanced Corpus of Contemporary Written Japanese (BCCWJ)." Proceedings of ISFC 35: Voices Around the World. Ed. Canzhong Wu, Christian M.I.M. Matthiessen, and Maria Herke. Sydney, 2008. Web. 4 Dec. 2010.
  15. Sinclair, John, and Ronald Carter. "The Search for Units of Meaning." Trust the Text: Language, Corpus and Discourse. London: Routledge, 2004. Print.
  16. Thomson, Elizabeth A. "Theme Unit Analysis: A Systemic Functional Treatment of Textual Meanings in Japanese." Functions of Language 12.2 (2005): 151-79. Web. 4 Dec. 2010.
  17. Tsujimura, Natsuko. An Introduction to Japanese Linguistics. Malden, MA: Blackwell Pub., 2007. Web. 2 Dec. 2010.

Appendix 1: ragan

1をかえてもらおうと思っています。  私は裸眼0.05です。  1番見えるようにしてと
4以上の疾病異常である。2 「近視」とは,裸眼視力1.0未満のもので矯正視力検査の結果
6あんまり効かないので、、、^^; でも、裸眼のままだと辛いです。 目が悪いのでコンタ
7も 1.2が限界みたいです。  あなたの裸眼にもよって  合わせられる度が違うと思い
10? 0.3なのですが、映画・運転以外は、裸眼で通しています。 メガネが似合わない・コ
17しかし、車とパソコンと人を探すとき以外は裸眼です。 もちろんテレビも。 ちゃんと2m
18ほうがいいことってなんですか? 今までは裸眼です。 あまり強いものだと、慣れるまで 
19タクトが面倒・怖いという理由で、なるべく裸眼でいたいのですが・・・。 現在視力0.1

Retrieved from BCCWJ demonstration version (http://www.kotonoha.gr.jp/).

Appendix 2: nikugan

2って観察しよう 昔の人になったつもりで,肉眼,水滴レンズ,ルーペで身近なものを観察し
4く知っていて郭清しなければなりません。 肉眼的(触診や視診)には、転移の有無がわから
6れば、患児は数日間活動を抑えてもよい。 肉眼的血尿発作回数が運動制限によって減ること
7う。 「腎障害に対する特殊な治療はない。肉眼的血尿発作はひとりでになおるものであり、
12れなりに検査、治療してると思うので。  肉眼的血尿はなくとも、3+は安心できる値では
15     素晴らしい展望です♪     肉眼的には申し分ない、展望ですが、     
18開け、コックピットから半身を乗り出すや、肉眼のみを頼りに照準を試みたのだ。 高度はあ
19魂  = 太陽 =5次元★闇の力の根源は肉眼に見える月ではなく、肉眼に見えない黒い月
24着が近づく。 着弾時の水柱が、はっきりと肉眼で視認できる位置に噴き上がり、海水を伝わ
26ウン管は元々残像を利用しているのです。 肉眼で見ると一枚のようですが、レーザーでビー
27処から発射されるのでしょうか?その様子を肉眼で見ることは出来るものでしょうか? 打上
28 地球の表面の三分の二は海だし、私たちが肉眼で見ることのできるのは海面というただの皮
29相関図に気付くはずがない。 彼らが実際、肉眼で見ることのできた渦の形といえばなんだろ
30学を超えなければならないのです。 人間が肉眼で見ることができる宇宙の星は、何百光年、
37治の頃からのようです。  七曜というのは肉眼で見える惑星の火星・水星・木星・金星・土
38うな場所があるのか。 人間の身体の中に、肉眼で見えるような形のある構造がいくつもある
39る】 仏といっても浄土といっても、それは肉眼で見えるものではありません。絵像や木像の
40輝く星が見えたのですが、 もしかしたら、肉眼で見えると話題になっている、彗星でしょう
45でしょうかね???????? 毛じらみは肉眼で確認できます。もし毛じらみを見つけられ
46ャーでも解ります。 PCのケース開けたら肉眼で確認できます。
49がった空に浮かぶ黒い機影が、地上からでも肉眼でハッキリと確認できた。「撃て! 撃て!
50らないのでしょうか? 再生はできますが、肉眼でカビが見えるなら 画質はひどいですよ。
52) 生物の部分を拡大する 私たちはふだん肉眼でものを見ているが,詳しく観察するために
55:0.025倍)(B) 連星とその質量 肉眼では1つにしか見えない恒星でも望遠鏡では
57てみよう。 人体の最小単位は細胞である。肉眼では見ることができないミクロの世界から出
64だくのです。 前にも述べましたが、仏壇も肉眼では見えない浄土をあらわそうとしたもので
67っていると思われますので、どちらか一つは肉眼では見えないかもしれません。 条件が良い
74あった。 十一月末のレントゲン写真では、肉眼ではもうほとんど見えないくらいまで消えて
76利用〈微生物とよばれるもの〉 生物の中で肉眼ではほとんど見えず,顕微鏡や電子顕微鏡で
78テラノーバという寄生虫らしいのですが、 肉眼ではっきり確認できるものなのでしょうか?
79ンデジで撮った夜景。今度は暗すぎる?! 肉眼ではすごく大きく見えました!!そして、と
80報により 高台へ ゆきました。  肉眼では うっすら 見えたのですが・・・。う
83。陽性か陰性かの最終判定は検査技師たちの肉眼で。(同)献血は「危険物」? さて、血液
85し)かの判定は、最終的には検査技師たちの肉眼が下すそうだ。 と同時に、別の検体は「N
86照射方法 照射方法は2種に大別されます。肉眼、直視下でハンドピース(CO2)またはロ

Retrieved from BCCWJ demonstration version (http://www.kotonoha.gr.jp/).

Appendix 3: itai me ni au

1けられない」「一度、友達の保証人になって痛い目みてるから無理」でしょうか。 兎に角、金
2にも三千にも見えたのである。 木曾軍に手痛い目に合わされた信長は木曾軍に対して恐怖をお
3情的にからんできます。 他の営業マンから痛い目にあっているので、お返しのつもりとも受け
4  こんな回答を書き込むと、以下のような痛い目にあいます。  上に挙げた例のように物理
5び人 ではないんじゃないでしょうか?  痛い目みますよ  既婚者はパートナーのところに
6入れとくのが無難。[ 印象 = 軽視して痛い目に合わされやすい ]●7枠14番=リキッ
7走ってましたから。  サンライズは宝塚で痛い目にあってましたのでいつかはと思ってました
8もあります。 ドライブを値段だけで選ぶと痛い目にあいます。 お勧めはRAM不要ならパイ
13いけないのです。なぜ? 今痛くても、後で痛い目をしなくてすむからなんです。今注射をしな
16仕事はありません。 世の中なめてかかると痛い目にあいますよ。   兜町のクワガタムシ
21。娘を殴ったら、お袋さんが騒いでな。少し痛い目を見たようだ」 亜紀子は目を伏せた。 あ
24もっこり』 だけどぉ・・・油断してると、痛い目にあうからねぇ!  (ノ∇^*) キャハ
25 ちなみに旦那は以前、風俗で病気移されて痛い目を見てるので、風俗には行かないと言ってま
27!! そんな事も知らずに、ネットしてたら痛い目に会うぞ!!!  しかしまだそのネタして
28 ま、世の中平等なら、そんな会社もいずれ痛い目にあうでしょうね。 無謀な勤務実態は明る
30混じりに答えた。「ただ、舞い上がってると痛い目に遭うから、ほどほどにね」 そしてスキッ
31かり気にして、男と男てえ関係を忘れてると痛い目に会うという…」 「男と男ねえ、なるほど
34たからです」 梶田の過去を探りまわるな。痛い目に遭うぞ。そこまではいい。だが、問題はそ
35っているのですか―」 「以前に、一度、手痛い目に会わされたことがある」 「力の加減を知
36き人間が自分に知恵が足りないことによって痛い目にあうのである。 こういったことを言って
37らって結婚すれば~? そうですよね~  痛い目を見る前に、 目がさめて欲しいですよね・
39」 典善も、小さく口元をゆるませた。 手痛い目に会わされたと言ってはいるが、この男も乱
40迷惑かけます。 だいたい貧血をあまくみて痛い目にあうひと多いです。 とかいって、私は「
43で登録して一人と知り合いましたが 結果、痛い目に合いました。 結婚したいという目的は同
47いいかげんな応対をしていては、あとで必ず痛い目に合う。親しみをもって近づく あるソフト
48 体がそれを覚えるまでどんな苦労や失敗や痛い目にあったか、裸馬の背骨で腿がすり切れたり
49のがわからないんでしょうか?? ちょっと痛い目見たほうがいいですねぇ~。ドラ1で入って
50は、緊張した声で、「何かあったのか」 「痛い目に遭わされたようです。そう言っていました
51か? 両方!! 外見だけ見て判断してると痛い目に合うし、 中身だけだと、飽きる。 だか
52軍のマスコミ操作報道により戦争につっ走り痛い目にあったことを忘れたのか! 一日も早く基
54に慣れていますね」 「記者にはいろいろと痛い目に遭わされているからな。大切なのは、こち
55じゃあ、誰のだと言うんだ。答え次第では、痛い目に合うぞ」 「伍長」隅倉は荒巻に言った。
59けます。  若者はまだ未来があるので将来痛い目に合うだろうからいいけど、  若者に限定
61。  彼女や人妻に言い寄られて手を出すと痛い目見るのでやめましょう。パートナーを裏切る
63さらではないか。 西洋でも、これらにより痛い目に合ったという事例が事欠かないのだろう。
65ね。 こういう一人の為に銀魂ファン全体が痛い目で見られるのがつらいです。

Retrieved from BCCWJ demonstration version (http://www.kotonoha.gr.jp/).

Appendix 4: distribution of nikugan collocations

Mi- 'see' 69

mie- 41

miena- 17

mienai 15

nikugan de wa mienakute mo 1

iu hodo kōhinsitu ni wa mienakatta 1

mieru 13

miezu 2

mieta 2

miemasen 1

miemasu 1

miemasita 1

nikugan de wa mienikui 1

nikugan de wa mienu 1

mie (end of line) 1

nikugan de no kansatu ni kurabete dono yō na miekata no tigai 1

miru 10

nikugan de miru to itimai no yō desu ga 1

miru koto 9

nikugan de miru koto wa dekiru 1

nikugan de miru koto no dekiru 1

nikugan de miru koto no dekita 1

nikugan de miru koto ga dekiru 1

hosi o miru koto ga dekimasu 1

nikugan de wa miru koto ga dekinai 1

nikugan de wa kessite miru koto no dekinai 1

tentai ha nikugan de sika miru koto ga dekinakatta 1

mite 6

nikugan de mite mo sō da si 1

kenbikyō de mite mite mo 1

me de mite wakaru 1

sore o mite 1

nikugan de mite iru nama no keiken 1

nikugan de mono o mite iru ga 1

mirare- 3

miraremasen 2

nikugantekiketunyō wa miraremasen 1

nikugan de wa miraremasen 1

miraretari 1

nikugantekiketunyō ga miraretari 1

mita 3

nikugan de mita 2

ironna kakudo kara mita syasin 1

mitukerare- 2

mitukerare 1

mitukerarenai 1

mikake 1

mi 1

Kansatu 'observation' 12 (often appears near but does not colligate with nikugan)

Kenbikyō 'microscope' 9 (often contrasted with nikugan)

Kakunin 'confirmation' 6

Nikugan de kakunin dekimasu 3

Nikugan de kakunin dekiru 1

Nikugan de hakkiru to kakunin dekita 1

Nikugan de hakkiri kakunin dekiru 1

Sinin 'visual confirmation' 2

Nikugan de sinin dekiru 1

Nikugan de sinin dekinai 1

Hakkiri 'clearly' 3

Hakken 'discovery' 2

Bōenkyō 'telescope' 2

Nozomu 'see' 1

Compiled from BCCWJ data in Appendix 2.