Toward a Japanese “linguistics of speech”: collocation in the BCCWJ

by Kieran Maynard (2010)

Advisor: Dr. William Kretzschmar

The University of Georgia LING 4080/6080

Keywords: Computational linguistics, Japanese, NLP, corpus linguistics, electronic corpora, semantic analysis

Corpus linguistics will soon benefit from the public release of a new electronic corpus of written Japanese. This study attempts to lay the foundation for an application of the Kretzschmar's (2009) “linguistics of speech” to Japanese speech data in the aggregate. We will first characterize the Kotonoha Project, describe our theoretical foundations and some issues specific to Japanese, then apply Sinclair's (2004) analytical methods to Japanese corpus evidence in search of significant collocations and the distributional pattern of the linguistics of speech.

The Kotonoha Project

The advent of computer processing and storage has made the compilation of massive language corpora possible, and the Internet has made them accessible to researchers. Computers enable analysis of large bodies of text, and these analyses have produced startling findings. While much work with corpora has been done with English, Japanese corpora have also been compiled and made available on the Web.

The National Institute for Japanese Language and Linguistics (NIJL) in Tokyo is compiling a “Balanced Corpus of Contemporary Written Japanese” (BCCWJ) slated to be opened to the public in 2011. According to Maekawa's (2007) estimate, the corpus will comprise approximately 100 million words “selected randomly from well-defined statistical populations covering [a] wide range of written texts” (Maekawa 2008). The NIJL defines a “balanced corpus” as one that “as accurately as possible represents contemporary Japanese” (gendai nihongo no jissi no dekiru dake seikaku na syukuzu to naru (“Kokuritsu”)). Previous studies of written Japanese have dealt with material either too old (e.g., copyright-expired literature), not sampled randomly, or skewed in its distribution. Newspaper writing, for instance, is produced for institutions that seek to minimize variation, and Internet writing lacks categorization and copyrighted works.

The BCCWJ will conflate three sub-corpora: the (1) “publication” or “production” 34.7 million-word sub-corpus that “consists of samples extracted randomly from the statistical population covering the whole body of books, magazines, and newspapers published [in Japan] during 2001-2005” (Maekawa 2008); (2) the “library” or “circulation” 30 million word sub-corpus sampled from “the whole [of] books registered in at least 13 public libraries in the Tokyo Metropolis” (ibid.); and the (3) “special-purpose” or “out-of-population” 35 million word sub-corpus comprising “various special purpose mini corpora” of about 5 million words each, “includ[ing] texts of governmental white papers, Internet text…, minutes of the [N]ational [D]iet, school textbooks, and best-selling books of the past 30 years” (ibid.).

My analysis uses the online demonstration version of the BCCWJ, which comprises “[a]s of September 2007… the full-text query of the 10 million words [of] texts that are copyright cleared [that] are publicly available on the web” (ibid.). The demonstration BCCWJ gives concordance lines for any search term, to a maximum of 500. Additional parameters may be entered in regular language to refine the search on either side of the node. The classifications given to concordance lines are author, author's decade of birth, author's gender, genre, book title/source, subtitle/classification, volume number, compiler, etc., publisher, and misc. notes. Though it is possible to copy the concordance lines into another program, statistical analysis cannot be carried out on the demonstration BCCWJ page; the eventual public release of the corpus will enable more rigorous statistical treatment of the data in this paper.

Words and linguistic features

What are the units of meaning in Japanese? The “word… does not reign unchallenged as the basic unit of language” (Sinclair 2004: 25)—other concepts, like the morpheme, have gained currency, yet the morpheme is often too small a unit for the study of linguistic variation in text. In Japanese text, where word boundaries are not differentiated by orthography, what counts as a word?

John Sinclair (2004) has proposed a model for compound lexical items in the structural analysis of English. In the chapter “The Search for Units of Meaning,” he makes the case for compound lexical items with four major structural categories: collocation, colligation, semantic preference, and semantic prosody. He posits a continuum in the lexis between the “open-choice principle” and the “idiom principle,” exemplified by the “terminological tendency” of words “to have a fixed meaning in reference to the world” (29) and the “phraseological tendency” of words to “go together and make meaning by their combinations” (29), respectively. He hypothesizes that (29-30):

…the notion of a linguistic item can be extended, at least for English, so that units of meaning are expected to be largely phrasal. Some words would still be chosen according to the open choice principle, but probably not very many, depending on the kind of discourse. The idea of a word carrying meaning on its own would be relegated to the margins of linguistic interest, in the enumeration of flora and fauna for example.

Sinclair has used the mid-1995 Bank of English 211 million word corpus to gather collocations for phrases and words to show that words often appear next to other words (collocate) and in certain grammatical patterns (colligate), and that an analyst can further abstract a “semantic preference” and “semantic prosody” (2004: 32-3). Even a common word like place (as it might appear in a sentence like “…She came over to my place with a friend…” (38)) Sinclair describes as “a compound lexical item which has a semantic prosody 'informal invitation,' a semantic preference for 'local travel' which is realized by colligation with a verb of movement and optionally a directional adverb, with come and over as typical collocations” (38).

Distributions of linguistic features and The Linguistics of Speech

William A. Kretzschmar, Jr. in The Linguistics of Speech (2009) goes beyond Sinclair's “phraseology” and defines “linguistic features” based on the work of Ferdinand de Saussure as “anything we can identify as an entity or unit having to do with what people say” (53). A linguistic feature is often larger or smaller than a word, which is in itself a tricky concept to define, but one that continues to influence our perceptions of language (54):

Linguistic features of speech, concrete entities, thus are commonly taken to be different words used for the same referent (synonyms), or alternative morphs or phones used as components of what we identify as the same word, or alternative arrangements of words in what we recognize to be sequences with equivalent meaning or organization. For Saussure, “identity” comes from such acts of recognition, as when we consider the word “messieurs” to be the same word even given variations in “delivery and intonation” by different speakers…

John Firth and others have shown that words derive their meaning from context, not the other way around. Where and how often they occur in texts, their distribution, is then very important. Michael Stubb's (2001) study of English described in Kretzschmar (2009) used corpora to calculate the rate of co-occurrence for word forms (“node” words) their collocates, and found that 90% of node words appeared near their top collocate at least 2% of the time, which is still 250 times the probability of co-occurrence by chance. Thus Kretzschmar (2009) posits that “words are not deployed randomly in speech, or evenly spaced, but instead they normally occur in clusters… in proximate association with other words as collocates” (154), therefore “it is… a normal feature of language in use that any given word, when considered as a node word, is likely to have multiple collocates with unexpectedly high rates of co-occurrence” (154-5).

Kretzschmar (2009) has proposed a new model for the study of language in use, called the “linguistics of speech” as a counterpart to the academic North American “linguistics of linguistic structure” (4), and shown that the distribution of linguistic features in speech and writing is non-linear. When organized into types and tokens and plotted by frequency, linguistic features always display an asymptotic hyperbolic curve, or “A-curve” (197). This distribution has been described before as the “80/20 Rule,” which predicts 20% of all types will account for 80% of all tokens. Vowel realizations, words, collocations, etc. all follow this distributional pattern, as Kretzschmar (2010) explains (20-1):

…the 80/20 Rule for such non-linear distributions (whether the actual percentages are 90/10 or 70/30) tells us that we will always find one or a few constructions that account for the great majority of the instances for the feature under study, and that there will be a large number of variant constructions for the feature that account for a small minority of the instances…

Systems characterized by such a distribution have been observed in other sciences, and now linguistics; they are known as “complex adaptive systems.” Kretzschmar explains (2010: 5):

This more-or-less 80/20 relationship is no mere curiosity but a sign that speech, language as we use it, behaves as a complex system. Complexity in this specialized sense (not just with the usual meaning 'complicated') is a property of many natural phenomena characterized in mathematical descriptions by "curves without tangents," continuous nondifferentiable functions--in other words, this sort of complexity is characterized by A-curves.

The model of the “linguistics of speech” is built on this understanding of the nature of “emergent order” in language, and offers a new way to study speech data in the aggregate.

The orthography problem

The Japanese writing system poses a bit of a problem in the linguistic analysis of Japanese. Written Japanese regularly appears in a combination of four scripts: kanji, hiragana, katakana, and rōmaji. Hiragana and katakana are basically phonemic systems of characters that represent syllables. Rōmaji are Roman characters used primarily in two competing systems (Hepburn and Kunrei) to record Japanese phonemically. Kanji, or Chinese characters, are less straightforward in their phonetic representation.

First introduced to the West by Jesuits and other missionaries, the characters were imagined to represent concepts divorced from the sounds of spoken language, a mistaken perception that persists to this day. Kanji are usually used to represent the vast heritage of Chinese loan words in Japanese, but also many native words (including 19^th- and 20^th-Century coinages using Sino-Japanese morphemes), some Buddhist terms (form Sanskrit, Pali, etc.), nativized Western loans (from Portuguese, Dutch, etc.) and even relatively recent loans like pēzi 'page'. Sometimes different kanji are used to differentiate homonyms, analogous with English <sail> and <sale>, while at other times different morphemes or words considered related by sense, paradigm or tradition will be written with the same kanji. For example, the native word mato 'target' is written with the same kanji as the Sino-Japanese morpheme teki- and, likewise the native naka 'inside' with the same kanji as the Sino-Japanese –tū in tekitū 'strike home'. Though their phonetic value is often opaque, kanji are certainly not ideograms, as their primary function in most Japanese text is the same as an alphabetic or any other writing system for human language: to record, even at a rough approximation, the sounds of spoken language. At other times described as morphemic or logographic, DeFrancis (1984) has proposed the term “morphosyllabic.”

While the Japanese government sets standards for the use of kanji, variation in use and their morphosyllabic nature may create ambiguity in which word kanji are intended to record. For this reason a potentially ambiguous search term consulted in the corpus in kanji must be checked against its context to confirm its “reading”: the spoken word, morpheme or syllable it represents. Some terms need to be searched in multiple orthographic forms (possible through the use of Boolean operators in the demo BCCWJ), as words may be written interchangeably in different systems according to tradition, visual appeal, etc.

The NIJL's sampling method is actually based on characters rather than words. Explaining the word-estimation formulas for the Corpus of Spontaneous Japanese (CSJ) produced by the NIJL, Maekawa (2007) warns, “…word boundary in Japanese is heavily theory dependent and hence is not reflected in ordinary orthography.” It may be argued, though, that the orthography itself exacerbates or even causes this problem, as word boundary in any language, as demonstrated by Sinclair (2004) and Kretzschmar (2009), is “heavily theory dependent.”

Naked eyes: ragan and nikugan

The exemplary expression “naked eye” Sinclair (2004) analyzes thus (34):

The speaker/writer selects a prosody of difficulty applied to a semantic preference of visibility. The semantic preference controls the collocational and colligations patterns, and is divided into verbs, typically see, and adjectives, typically visible. With see, etc., there is a strong colligations with modals – particularly can, could in the expression of difficulty – and with the preposition with to link with the final segment. With visible, etc., the pattern of collocation is principally with degree adverbs, and the negative morpheme in-; the following preposition is to. The final component of the item is the core, the almost invariable phrase the naked eye.

As he points out, the phrase is semantically opaque; “unclothed organ of sight” (2004: 31) is not enough to deduce the meaning, and “naked in the collocation naked eye could equally well mean… 'without spectacles, contact lenses, etc.'” (31). In fact Japanese has just such a word ragan, a Sino-Japanese compound comprising the morphemes /ra/ 'naked' (e.g. ratai 'a nude') and /gan/ 'eye' (e.g. gankyū 'eyeball') defined as “the eye without the aid of corrective lenses” (Kondō & Takano 2001). The BCCWJ contains 20 instances of ragan, which collocates with siryoku 'eyesight' 11 times (4 times at N+1, in the compound ragansiryoku 'uncorrected vision'; see appendix 1). As ragan collocates 9 times with the instrumental particle de at N+1, as in ragan-de mi-te (ragan-Instr see-gerund), we can posit colligation with de. The semantic preference we may describe as “visibility”; visual activities like eiga 'film,' pasokon 'computer', and yomi 'reading' appear to the left of the node. The semantic prosody, however, differs from English naked eye. Numbers used in the prescription of glasses appear in 11 lines, and 4 lines contain warui 'bad' as a comment on me 'eye': ragan suggests a semantic prosody of ophthalmology.

The term naked eye is more akin to the Japanese nikugan. The morpheme /niku/ 'flesh' (e.g. nikutai '(the physical) body') is combined with /gan/ 'eye' in /nikugan/, defined as “the eye possessed of the human body; natural eyesight without the use of a telescope, microscope, etc.” (Daijisen 1998). The BCCWJ contains 86 instances of nikugan, which collocates with miru 'to see' (in its various forms) 69 times (see appendix 2). At N+1 the instrumental particle de appears in 70% of the examples (two more simply separate nikugan and de with other instruments in a list). Collocation with the suffix teki makes nikugan an adjective in 14 cases (in the sense 'macroscopic'); in 71% these cases (8.6% of total) nikugan appears in the compound nikugantekiketunyō 'macroscopic hematuria.' An 80% collocation with miru 'see' (conflating inflected forms) and an additional 14% with kansatu 'observation' once again suggest a semantic preference for “visibility.” Of the collocations with miru, 72% involve possibility, divided 26:24 between expressions of the possibility and impossibility of seeing something with the naked eye; it seems nikugan carries a similar semantic prosody to naked eye, described by Sinclair (2004) as “difficulty” (33), which “may just be hinted at by a modal verb such as can or could or more directly by a negative with 'visibility'” (43). For example:

nikugan-de mi-e-ru wakusei

naked eye-Inst see-Pot-nonpast planet

'planet visible to the naked eye'

Failure to see something with the naked eye accounts for 29% of all cases, as in:

nikugan-de-wa mi-ru-koto-ga deki-nai mikuro-no-sekai

naked eye-Instr-Top see-that-Nom can-nonpast neg. micro-Gen-world

'microscopic world that cannot be seen by the naked eye'

The use of nikugan in the affirmative seems to imply that the visibility is unusual, as in:

Tusima-kara taigan-no-kankoku-o nikugan-de nozomu-koto-ga dekiru

Tsushima-from opposite shore-Gen-Korea-Acc naked eye-Insrt-thing-Nom possible

'from Tsushima Island the Korean shore can be seen with the naked eye'

Phraseology: itai me ni au

The next phrase we will examine, itai me ni au 'run into trouble', exhibits a strong phraseological tendency. Daijiten (1998) lists the entire phrase, defined as: “experience pain or suffering; have an awful experience” along with synonymous phrases hidoi me ni au and itai me o miru. The phrase me ni au 'have an experience' appears 600 times in the BCCWJ, and approximately 8% of these are itai me ni au. The phrase itai me occurs 65 times (see appendix 3). Of these, 72% are in some permutation of the phrase itai me ni au:

yudan-si-te-i-ru-to, ita-i me-ni a-u-kara-ne! (24)

neglect-do-gerund-is-nonpast-Cond, painful-nonpast experience-to meet-nonpast

''cause if you don't pay attention you'll run into trouble!'

A further 26% are in itai me o miru:

te-o-da-su-to ita-i me-o-mi-ru (61)

hand-Acc-send out-nonpast-Cond painful-nonpast experience-Acc see-nonpast

'if you make a move you'll be in trouble'

In colligation itai me ni au shows very little variation in particle use: 46 cases of itai me ni and 1 case of itai me ni mo. The 17 cases of itai me o miru shows variation common with the direct object o particle in general, in 5 cases of elision (61: itai me miru) and 1 case of topicalization (62: itai me wa mite mo sikata nai).

Conclusions

Having looked at three linguistic features, we should notice a recurrent distributional pattern: 70% collocation with de and 80% collocation of miru with nikugan, 72% possibility expressions versus others within the miru collocates, and 72% collocation of itai me with ni au within all itai me instances. These distributions exemplify the 80/20 Rule; in other words, the A-curve distribution is “robust” enough to appear even in our rudimentary statistical analysis (see appendix 4). No matter what linguistic feature is chosen, be it a particle (morpheme), word or phrase, the BCCWJ data exhibits significant clustering as expected in the linguistics of speech. Linguists should feel confident they will make new discoveries when the BCCWJ data is released in full.

Works cited

Aronoff, Mark, and Kirsten Anne. Fudeman. "Words and Lexemes." What Is Morphology? Malden, MA: Blackwell Pub., 2005. Print.
Barfield, Andrew, and Henrik Gyllstad. Researching Collocations in Another Language: Multiple Interpretations. New York: Palgrave Macmillan, 2009. Print.
DeFrancis, John. "The Ideographic Myth." The Chinese Language: Fact and Fantasy. Honolulu: University of Hawaii, 1984. Web. 2 Dec. 2010.
Digital Daijisen. Tokyo: Shogakukan, 1998. Web. 2 Dec. 2010.
Hasegawa, Yoko. "The Tense-aspect Controversy Revisted: The -ta and -ru Forms in Japanese." Pragmatics in 1998: Selected Papers from the 6th International Pragmatics Conference. Ed. Jef Verschueren. Vol. 2. Antwerpen: International Pragmatics Association, 1999. 225-40. Web. 3 Dec. 2010.
"Kokuritsu Gengo Kenkyūsho No Gengo Kōpasu Seibi Keikaku Kotonoha." National Institute for Japanese Language and Linguistics. Web. 03 Dec. 2010. <http://www.ninjal.ac.jp/kotonoha/>.
Kondō, Ineko, and Fumi Takano. Puroguresshibu Waei Chūjiten ["Progressive" Japanese-English Dictionary]. 3rd ed. Shogakukan, 2001. Kotobank. Web. 5 Dec. 2010.
"Kotonoha Gendai Nihongo Kakikotoba Kinkō Kōpasu Kensaku Demonsutorēshon." National Institute for Japanese Language and Linguistics. Web. 11 Nov. 2010. <http://www.kotonoha.gr.jp/demo/>.
Kretzschmar, William A., Jr. "The 80/20 Rule in English Grammar." Proc. of NAES-FINSSE 2010, Oulu. Web. 5 Dec. 2010.
Kretzschmar, William A., Jr. The Linguistics of Speech. Cambridge: Cambridge UP, 2009. Print.
Maekawa, Kikuo. "Balanced Corpus of Contemporary Written Japanese." Proc. of The 6th Workshop on Asian Language Resources, 2008, Hyderabad, India. Web. 1 Dec. 2010.
Maekawa, Kikuo. "KOTONOHA and BCCWJ: Development of a Balanced Corpus of Contemporary Written Japanese." Corpora and Language Research: Proceedings of the First International Conference on Korean Language, Literature, and Culture. Seoul, 2007. Web. 4 Dec. 2010.
Maekawa, Kikuo. "Quantitative Analysis of Word-form Variation Using a Spontaneous Speech Corpus." Proc. of Corpus Linguistics 2005, Birmingham. Web. 4 Dec. 2010.
Sano, Motoki, and Takehiko Maruyama. "Lexical Density in Japanese Texts: Classifying Text Samples in the Balanced Corpus of Contemporary Written Japanese (BCCWJ)." Proceedings of ISFC 35: Voices Around the World. Ed. Canzhong Wu, Christian M.I.M. Matthiessen, and Maria Herke. Sydney, 2008. Web. 4 Dec. 2010.
Sinclair, John, and Ronald Carter. "The Search for Units of Meaning." Trust the Text: Language, Corpus and Discourse. London: Routledge, 2004. Print.
Thomson, Elizabeth A. "Theme Unit Analysis: A Systemic Functional Treatment of Textual Meanings in Japanese." Functions of Language 12.2 (2005): 151-79. Web. 4 Dec. 2010.
Tsujimura, Natsuko. An Introduction to Japanese Linguistics. Malden, MA: Blackwell Pub., 2007. Web. 2 Dec. 2010.

Appendix 1: ragan

表示番号	前文脈	検索文字列	後文脈
1	をかえてもらおうと思っています。　　私は	裸眼	０．０５です。　　１番見えるようにしてと
2	も年々増加傾向にある。次に高いものは，「	裸眼	視力１．０未満の者」であり，小学校２０．
3	も年々増加傾向にある。次に高いものは，「	裸眼	視力１．０未満の者」であり，小学校１９．
4	以上の疾病異常である。２　「近視」とは，	裸眼	視力１．０未満のもので矯正視力検査の結果
5	硬い文章を読み慣れていない方だけでなく、	裸眼	視力二・〇のわたしでさえ、虫眼鏡の力を借
6	あんまり効かないので、、、＾＾；　でも、	裸眼	のままだと辛いです。　目が悪いのでコンタ
7	も　１．２が限界みたいです。　　あなたの	裸眼	にもよって　　合わせられる度が違うと思い
8	しても思うツボなわけじゃん。凄い男だな。	裸眼	でＧＯ！（吉田美紀子）『オフィスグッズ』
9	方の目が遠視だということがわかりました。	裸眼	で１．０と悪いほうの眼は０．５～０．８く
10	？　０．３なのですが、映画・運転以外は、	裸眼	で通しています。　メガネが似合わない・コ
11	視力が悪いのに無理して	裸眼	で見て目つきが悪くなっている女性と、メガ
12	れば、覗きは一瞬の行為である。まず近視の	裸眼	で見て、入浴を確かめてから眼鏡をかけるな
13	視力があり約一メートルの近距離であれば、	裸眼	で十分目的を達しうるものであることは経験
14	もなかった。が、「君は大丈夫、そのままの	裸眼	で充分立派な視力だと思うよ」と説明しても
15	けど例えば視力が０．５とか０．６だったら	裸眼	でも生活上支障はないなら受ける必要性はな
16	対象という意味かもしれません）どうしても	裸眼	での視力を回復させたい、という方にはもっ
17	しかし、車とパソコンと人を探すとき以外は	裸眼	です。　もちろんテレビも。　ちゃんと２ｍ
18	ほうがいいことってなんですか？　今までは	裸眼	です。　あまり強いものだと、慣れるまで
19	タクトが面倒・怖いという理由で、なるべく	裸眼	でいたいのですが・・・。　現在視力０．１
20	目撃したのは、わずか数秒間、視力〇・一の	裸眼	で、一度に顔全部は見えないような細い透き

Retrieved from BCCWJ demonstration version (http://www.kotonoha.gr.jp/).

Appendix 2: nikugan

表示番号	前文脈	検索文字列	後文脈
1	観察する。何種類かの葉を観察する。(４)	肉眼	，水滴レンズ，ルーペの観察結果について，
2	って観察しよう　昔の人になったつもりで，	肉眼	，水滴レンズ，ルーペで身近なものを観察し
3	ついただいて、比べてみました。種子の翼は	肉眼	（私はド近眼）ではよく見えません。１０倍
4	く知っていて郭清しなければなりません。	肉眼	的（触診や視診）には、転移の有無がわから
5	ひいたときにはじめて目で見てわかる血尿（	肉眼	的血尿）が出た。それを見てお母さんはかな
6	れば、患児は数日間活動を抑えてもよい。	肉眼	的血尿発作回数が運動制限によって減ること
7	う。　「腎障害に対する特殊な治療はない。	肉眼	的血尿発作はひとりでになおるものであり、
8	ことが多いです。●臨床像顕微鏡的あるいは	肉眼	的血尿を伴い、患側の腎部殴打痛を認めます
9	床的に高度蛋白尿や高血圧の存在があるが、	肉眼	的血尿やネフローゼ症候群は関係がない。こ
10	尿で発症する例もあるが，この場合は数日で	肉眼	的血尿は消失し，その後顕微鏡的血尿が持続
11	＋３の結果が出て、再検しても同様でした。	肉眼	的血尿はみられません。最近は両側背の痛み
12	れなりに検査、治療してると思うので。	肉眼	的血尿はなくとも、３＋は安心できる値では
13	発見される例が多い．上気道感染に引き続き	肉眼	的血尿で発症する例もあるが，この場合は数
14	見されることが多く、上気道炎や下痢の時に	肉眼	的血尿がみられたり、急性腎炎症候群やネフ
15	素晴らしい展望です♪	肉眼	的には申し分ない、展望ですが、
16	、一六二八年のことである。ハーヴィーは、	肉眼	的な観察によって、動脈から器官に入った血
17	筋肉（上腕二頭筋）とかである。こういう、	肉眼	的な形のある構造を器官（ｏｒｇａｎ）とい
18	開け、コックピットから半身を乗り出すや、	肉眼	のみを頼りに照準を試みたのだ。　高度はあ
19	魂　　＝　太陽　＝５次元★闇の力の根源は	肉眼	に見える月ではなく、肉眼に見えない黒い月
20	★闇の力の根源は肉眼に見える月ではなく、	肉眼	に見えない黒い月である、リリットから来る
21	と思う」と述べている。たしかに、かつての	肉眼	による魚骨採集では、東京湾奥の中期貝塚と
22	でした。そんな大感動のサクラタデですが、	肉眼	とレンズを通したのとでは、まるっきり色合
23	にして，これを感材に密着しＲＩの分布像を	肉眼	で観察するものである．マクロオートラジオ
24	着が近づく。　着弾時の水柱が、はっきりと	肉眼	で視認できる位置に噴き上がり、海水を伝わ
25	測定方式及びマルチバンドカメラを利用して	肉眼	で視認できない海中の噴出物等の状況をは握
26	ウン管は元々残像を利用しているのです。	肉眼	で見ると一枚のようですが、レーザーでビー
27	処から発射されるのでしょうか？その様子を	肉眼	で見ることは出来るものでしょうか？　打上
28	地球の表面の三分の二は海だし、私たちが	肉眼	で見ることのできるのは海面というただの皮
29	相関図に気付くはずがない。　彼らが実際、	肉眼	で見ることのできた渦の形といえばなんだろ
30	学を超えなければならないのです。　人間が	肉眼	で見ることができる宇宙の星は、何百光年、
31	大きなものが出来上がっていく。われわれが	肉眼	で見ることができるのは、三原子体以上であ
32	みたが、いうほど高品質にも見えなかった。	肉眼	で見てもそうだし、顕微鏡で見てみても、そ
33	移動撮影しているときこそ、まだ足で立って	肉眼	で見ている生の経験に近いが、受容者にとっ
34	、「共産主義の支配」という歴史的必然性を	肉眼	で見たと思ったのだろう。弾圧に屈せず、戦
35	たエネルギーの量を意味している。地球から	肉眼	で見たときの等級を見かけの等級といい，こ
36	ります。２５パーセクで７等星になります。	肉眼	で見える限界は、５等か６等だったと思いま
37	治の頃からのようです。　　七曜というのは	肉眼	で見える惑星の火星・水星・木星・金星・土
38	うな場所があるのか。　人間の身体の中に、	肉眼	で見えるような形のある構造がいくつもある
39	る】　仏といっても浄土といっても、それは	肉眼	で見えるものではありません。絵像や木像の
40	輝く星が見えたのですが、　もしかしたら、	肉眼	で見えると話題になっている、彗星でしょう
41	見えない光」をまとめて「電磁波」と呼ぶ。	肉眼	で見える「可視光」は電磁波のほんの一部に
42	る。電波は，太陽のような恒星はもちろん，	肉眼	で見えない低温で希薄なガスの中の分子や原
43	し、話をしません。しかし神父はその姿形が	肉眼	で確認できるし、話もする、悩みも聴いてく
44	の団体でわさわさと海から上がってくるので	肉眼	で確認できますが、けっこう小さいので、双
45	でしょうかね？？？？？？？？　毛じらみは	肉眼	で確認できます。もし毛じらみを見つけられ
46	ャーでも解ります。　ＰＣのケース開けたら	肉眼	で確認できます。
47	れいに晴れた日なら、対馬から対岸の韓国を	肉眼	で望むことができる。対馬海峡は朝鮮半島ま
48	なりません。「私の中に何かが生きている。	肉眼	で世界の色彩と形態を見、芸術的に変容させ
49	がった空に浮かぶ黒い機影が、地上からでも	肉眼	でハッキリと確認できた。「撃て！　撃て！
50	らないのでしょうか？　再生はできますが、	肉眼	でカビが見えるなら　画質はひどいですよ。
51	アンデスの山からは星々が美しく見えます。	肉眼	でも八千個にのぼる星を見ることができます
52	)　生物の部分を拡大する　私たちはふだん	肉眼	でものを見ているが，詳しく観察するために
53	トを、土、日曜をかけて観察する。もとより	肉眼	でみえる世界ではなく、高倍率の顕微鏡をも
54	なる動物，植物，魚がいる．しかし，これは	肉眼	でみえる世界での生物にすぎない．実は，海
55	：０．０２５倍）(Ｂ)　連星とその質量	肉眼	では１つにしか見えない恒星でも望遠鏡では
56	ものを見ているが，詳しく観察するためには	肉眼	では限界がある。そこでレンズという道具を
57	てみよう。　人体の最小単位は細胞である。	肉眼	では見ることができないミクロの世界から出
58	舞台には音声装置がありません。この舞台は	肉眼	では見られません。舞台も演者も観客もその
59	いう。フィルムとフィルターのいたずらが、	肉眼	では見えぬ地下の遺跡を透視するのだと、そ
60	。ごらんになれますでしょうか、小さすぎて	肉眼	では見えにくいかもしれません。もともと虫
61	物でも存在がたしかなものは沢山あります。	肉眼	では見えなくても、顕微鏡のような眼鏡をか
62	あるが、トールキンは指輪の姿を消す原理を	肉眼	では見えない霊的な世界に行ったためと説明
63	している。彼らは土を「食」として、私達の	肉眼	では見えない生命現象を果てしもなく行って
64	だくのです。　前にも述べましたが、仏壇も	肉眼	では見えない浄土をあらわそうとしたもので
65	らにたどっていくと、枝分かれを繰り返して	肉眼	では見えないほどに細くなってしまう。静脈
66	うなもようが見える。このことから，金は，	肉眼	では見えないきわめて小さな粒子が規則正し
67	っていると思われますので、どちらか一つは	肉眼	では見えないかもしれません。　条件が良い
68	、小さな球がいっぱい詰まっている。むろん	肉眼	では見えず、顕微鏡を使って初めてわかる。
69	腎炎というものは蛋白尿と、顕微鏡的血尿（	肉眼	では血液がまじっていることがわからないが
70	すのは、簡単なようでなかなかできません。	肉眼	では立体的に見えるんですが、写真のような
71	ら、千年二千年前の遺跡を発見する技術で、	肉眼	では決して見ることのできない地下の遺跡が
72	したイラク軍の位置をＧＰＳに入力し、全く	肉眼	では敵が見えない状態でも長距離砲弾、多連
73	、高精度のデジタルカメラや赤外線などで、	肉眼	では描線が見えない白虎の背中や目、前脚な
74	あった。　十一月末のレントゲン写真では、	肉眼	ではもうほとんど見えないくらいまで消えて
75	，湖など水中あるいは土の中や大気中にも，	肉眼	ではみえない数多くの生物が生存しているの
76	利用〈微生物とよばれるもの〉　生物の中で	肉眼	ではほとんど見えず，顕微鏡や電子顕微鏡で
77	ップです、いろんな角度から見た写真です。	肉眼	ではなかなか見つけられない形もアップにす
78	テラノーバという寄生虫らしいのですが、	肉眼	ではっきり確認できるものなのでしょうか？
79	ンデジで撮った夜景。今度は暗すぎる？！	肉眼	ではすごく大きく見えました！！そして、と
80	報により　高台へ　ゆきました。	肉眼	では　うっすら　見えたのですが・・・。う
81	して観察することになる。レンズを使うと，	肉眼	での観察に比べてどのような見え方の違いが
82	紀になって望遠鏡が発明されるまで，天体は	肉眼	でしか見ることができなかった。しかし１６
83	。陽性か陰性かの最終判定は検査技師たちの	肉眼	で。（同）献血は「危険物」？　さて、血液
84	について，下の考察の観点から話し合う。・	肉眼	だけのときとレンズを使ったときとでは見え
85	し）かの判定は、最終的には検査技師たちの	肉眼	が下すそうだ。　と同時に、別の検体は「Ｎ
86	照射方法　照射方法は２種に大別されます。	肉眼	、直視下でハンドピース（ＣＯ２）またはロ

Retrieved from BCCWJ demonstration version (http://www.kotonoha.gr.jp/).

Appendix 3: itai me ni au

表示番号	前文脈	検索文字列	後文脈
1	けられない」「一度、友達の保証人になって	痛い目	みてるから無理」でしょうか。　兎に角、金
2	にも三千にも見えたのである。　木曾軍に手	痛い目	に合わされた信長は木曾軍に対して恐怖をお
3	情的にからんできます。　他の営業マンから	痛い目	にあっているので、お返しのつもりとも受け
4	こんな回答を書き込むと、以下のような	痛い目	にあいます。　　上に挙げた例のように物理
5	び人　ではないんじゃないでしょうか？	痛い目	みますよ　　既婚者はパートナーのところに
6	入れとくのが無難。［　印象　＝　軽視して	痛い目	に合わされやすい　］●７枠１４番＝リキッ
7	走ってましたから。　　サンライズは宝塚で	痛い目	にあってましたのでいつかはと思ってました
8	もあります。　ドライブを値段だけで選ぶと	痛い目	にあいます。　お勧めはＲＡＭ不要ならパイ
9	んですけどね〜♪さすがに無理すると年明け	痛い目	みるよ思いまして安静にしてました（＊＾皿
10	陥れ、四月には、遠州森の一之瀬で徳川軍を	痛い目	に合わせた。その昌幸が長島に潜入して、軍
11	たタビのことを思い出し、「ああ、大ボス、	痛い目	にあって辛かったんやね。もう二度と帰って
12	ら、値動きが荒っぽくて、高値づかみすると	痛い目	にあいます。この荒い値動きに食いつくため
13	いけないのです。なぜ？　今痛くても、後で	痛い目	をしなくてすむからなんです。今注射をしな
14	ない感じがします、新米オーナー様のほうが	痛い目	に有った分良い文章が書ける感じがします。
15	な気がした。このままではいずれこちらが手	痛い目	にあわされるのではないか。このアメリカ人
16	仕事はありません。　世の中なめてかかると	痛い目	にあいますよ。　　　兜町のクワガタムシ
17	てやるつもりだった。体に痕の残らぬ程度に	痛い目	を見せて、剣を手から打ち落としてやればあ
18	リノスも不安だな。」セイジ「マリノスには	痛い目	に遭いまくったからね。これが結末になるこ
19	んとしても対面させたかった。ハーウッドが	痛い目	にあわされるのはまちがいない。やつは女に
20	てきませんでしたけどね。あんなやつは早晩	痛い目	にあいますよ。てかあってもらわなきゃ困る
21	。娘を殴ったら、お袋さんが騒いでな。少し	痛い目	を見たようだ」　亜紀子は目を伏せた。　あ
22	同じようについついインターネットをすると	痛い目	に遭いますよパケ放題は、日本国内からのア
23	鮮な驚きを与えました。そこには、ときには	痛い目	にも遭いながら、わたしが現実から教えられ
24	もっこり』　だけどぉ・・・油断してると、	痛い目	にあうからねぇ！　　（ノ∇＾＊）　キャハ
25	ちなみに旦那は以前、風俗で病気移されて	痛い目	を見てるので、風俗には行かないと言ってま
26	ないか…これらの機器に全幅の信頼をおくと	痛い目	に遭うかもしれない。
27	！！　そんな事も知らずに、ネットしてたら	痛い目	に会うぞ！！！　　しかしまだそのネタして
28	ま、世の中平等なら、そんな会社もいずれ	痛い目	にあうでしょうね。　無謀な勤務実態は明る
29	もやっぱりそうです。なんていうか、実際に	痛い目	を見ることになる前に、防衛本能が働くんで
30	混じりに答えた。「ただ、舞い上がってると	痛い目	に遭うから、ほどほどにね」　そしてスキッ
31	かり気にして、男と男てえ関係を忘れてると	痛い目	に会うという…」　「男と男ねえ、なるほど
32	ね。これで久保田は大丈夫と思ったら、また	痛い目	にあうと思うんだけど・・そして最後は藤川
33	な男なのだ。本気になってしまったら自分が	痛い目	を見るだけだとわかっているが、黒坂はいち
34	たからです」　梶田の過去を探りまわるな。	痛い目	に遭うぞ。そこまではいい。だが、問題はそ
35	っているのですか―」　「以前に、一度、手	痛い目	に会わされたことがある」　「力の加減を知
36	き人間が自分に知恵が足りないことによって	痛い目	にあうのである。　こういったことを言って
37	らって結婚すれば～？　そうですよね～	痛い目	を見る前に、　目がさめて欲しいですよね･
38	るのですがこれは見極めを間違うとドエライ	痛い目	に遭うのでまた次回お話しようと思います。
39	」　典善も、小さく口元をゆるませた。　手	痛い目	に会わされたと言ってはいるが、この男も乱
40	迷惑かけます。　だいたい貧血をあまくみて	痛い目	にあうひと多いです。　とかいって、私は「
41	みたって無益なのかもしれませんから。と、	痛い目	を見る度に、そうやって自分を正当化して振
42	いますが。今の世の中ぬくぬくしている人が	痛い目	に遭うのもそお遠くないでしょう、政治では
43	で登録して一人と知り合いましたが　結果、	痛い目	に合いました。　結婚したいという目的は同
44	てわかったの。何事も、ＴＰＯを守らないと	痛い目	にあうんだ、って。夏祭りの夜、ちゃんと誘
45	う。今回のオリンピックは、真面目な人ほど	痛い目	を見る気がします。それは北京オリンピック
46	こをきちんと考えなければいけない。実際、	痛い目	に遭っているのは住民なのだ。だから事故が
47	いいかげんな応対をしていては、あとで必ず	痛い目	に合う。親しみをもって近づく　あるソフト
48	体がそれを覚えるまでどんな苦労や失敗や	痛い目	にあったか、裸馬の背骨で腿がすり切れたり
49	のがわからないんでしょうか？？　ちょっと	痛い目	見たほうがいいですねぇ～。ドラ１で入って
50	は、緊張した声で、「何かあったのか」　「	痛い目	に遭わされたようです。そう言っていました
51	か？　両方！！　外見だけ見て判断してると	痛い目	に合うし、　中身だけだと、飽きる。　だか
52	軍のマスコミ操作報道により戦争につっ走り	痛い目	にあったことを忘れたのか！　一日も早く基
53	なくこっちに回してきやがったのさ。体練で	痛い目	見りゃあ嫌でも気が変わるだろうってな。そ
54	に慣れていますね」　「記者にはいろいろと	痛い目	に遭わされているからな。大切なのは、こち
55	じゃあ、誰のだと言うんだ。答え次第では、	痛い目	に合うぞ」　「伍長」隅倉は荒巻に言った。
56	言わなかった。一号機の第一回定期点検の時	痛い目	にあったことを忘れていなかったのである。
57	えのチームメイトのお嬢さんたちも、みんな	痛い目	見ることになるぞ。俺たちの組織にはな、ち
58	求されるというようなことで、大変消費者が	痛い目	に遭わされるというような場合があるようで
59	けます。　　若者はまだ未来があるので将来	痛い目	に合うだろうからいいけど、　　若者に限定
60	昨年十一月の加藤政局で、加藤さんが一番手	痛い目	にあった人だ。それなら、加藤派からだれか
61	。　　彼女や人妻に言い寄られて手を出すと	痛い目	見るのでやめましょう。パートナーを裏切る
62	いる。吉岡の喜ぶ顔を見るためなら、少々の	痛い目	は見てもしかたないと、あぐりは静かなる絶
63	さらではないか。　西洋でも、これらにより	痛い目	に合ったという事例が事欠かないのだろう。
64	真田昌幸の指揮する乱破部隊に捕捉されて手	痛い目	にあって、城に逃げ込もうとする。その後を
65	ね。　こういう一人の為に銀魂ファン全体が	痛い目	で見られるのがつらいです。

Retrieved from BCCWJ demonstration version (http://www.kotonoha.gr.jp/).

Appendix 4: distribution of nikugan collocations

Mi- 'see' 69

mie- 41

miena- 17

mienai 15

nikugan de wa mienakute mo 1

iu hodo kōhinsitu ni wa mienakatta 1

mieru 13

miezu 2

mieta 2

miemasen 1

miemasu 1

miemasita 1

nikugan de wa mienikui 1

nikugan de wa mienu 1

mie (end of line) 1

nikugan de no kansatu ni kurabete dono yō na miekata no tigai 1

miru 10

nikugan de miru to itimai no yō desu ga 1

miru koto 9

nikugan de miru koto wa dekiru 1

nikugan de miru koto no dekiru 1

nikugan de miru koto no dekita 1

nikugan de miru koto ga dekiru 1

hosi o miru koto ga dekimasu 1

nikugan de wa miru koto ga dekinai 1

nikugan de wa kessite miru koto no dekinai 1

tentai ha nikugan de sika miru koto ga dekinakatta 1

mite 6

nikugan de mite mo sō da si 1

kenbikyō de mite mite mo 1

me de mite wakaru 1

sore o mite 1

nikugan de mite iru nama no keiken 1

nikugan de mono o mite iru ga 1

mirare- 3

miraremasen 2

nikugantekiketunyō wa miraremasen 1

nikugan de wa miraremasen 1

miraretari 1

nikugantekiketunyō ga miraretari 1

mita 3

nikugan de mita 2

ironna kakudo kara mita syasin 1

mitukerare- 2

mitukerare 1

mitukerarenai 1

mikake 1

mi 1

Kansatu 'observation' 12 (often appears near but does not colligate with nikugan)

Kenbikyō 'microscope' 9 (often contrasted with nikugan)

Kakunin 'confirmation' 6

Nikugan de kakunin dekimasu 3

Nikugan de kakunin dekiru 1

Nikugan de hakkiru to kakunin dekita 1

Nikugan de hakkiri kakunin dekiru 1

Sinin 'visual confirmation' 2

Nikugan de sinin dekiru 1

Nikugan de sinin dekinai 1

Hakkiri 'clearly' 3

Hakken 'discovery' 2

Bōenkyō 'telescope' 2

Nozomu 'see' 1

Compiled from BCCWJ data in Appendix 2.