자연언어 머신러닝 관련 용어

2018. 7. 31. 11:41

https://console.bluemix.net/docs/services/watson-knowledge-studio/glossary.html#gloss_M

annotation

Information about a span of text. For example, an annotation might indicate that a span of text represents a company name.

annotation set

In human annotation, a collection of documents that are extracted from the corpus that allow the workload to be shared by multiple human annotators. In machine-based annotation, a collection of documents that can be used as blind data, training data, or test data.

human annotator

A subject matter expert who reviews, modifies, and augments the results of pre-annotation by identifying mentions, entity type relationships, and mention coreferences. By examining text in context, a human annotator helps determine ground truth and improve the accuracy of the machine learning model.

[ヒューマン・アノテーション]：人間の目でドキュメントを確認し、その中の単語や関係に対して手動でentityやrelationを割り当てる方法です。ある意味もっとも確実といえる方法ですが、時間がかかったり人間ならではのミスもある方法です。複数のヒューマンアノテーターで一斉にアノテーションすることにより正確さを高めることもできますが、その場合はアノテーター間で矛盾したアノテーションをおこなう（アノテーションの競合といいます）可能性もあるため、最終的にはその競合を解消する必要があります。

entity

A mention that is annotated by an entity type.
A person, object, or concept about which information is stored.
A set of details that are held about a real-world object such as a person, location, or bank account. An entity is a kind of item.

entity type

The type of entity that a mention represents without consideration for context. For example, the mention IBM might be annotated by the entity type ORGANIZATION.

In an entity-relationship model, an entity type is the thing that is being modeled or the thing that a mention refers to, such as the name of a person or place. Different entity types have different sets of attributes such as "surname" or "home town", and are connected through relationships like "lives in". An entity type exists independently and can be uniquely identified.

[entity type]は単語の種類を分類したものです。以下例を示します。

- メアリー(Mary), ボブ(Bob), トーマス・ワトソン(Thomas Watson), オバマ大統領(President Obama)などは、人(PERSON)entity typeとしてアノテーションされます。

- IBM, Google, Microsoftは、組織(ORGANIZATION)entity typeとしてアノテーションされます。

relation

Typically a verb that reflects how entities are related to one another. For example, "lives in" is a relation between a person and a town. A relation links two different entities in the same sentence.

relation type

A binary, uni-directional relationship between two entities. For example, Mary employedBy IBM is a valid relationship; IBM employedBy Mary is not.

[relation type]は2つのentity間の順序付けられた関係を定義します。以下例を示します。

- メアリーはIBMで働いている(Mary works for IBM)という文からは、～で働いている(employedBy)relation typeがアノテーションされます。

- IBMの創始者であるトーマスワトソン(IBM founder Thomas Watson)という文からは、～の創始者である(founderOf)relation typeがアノテーションされます。

rule set

A set of rules that define patterns for annotating text. If a pattern applies, then the actions of the rule are performed on the matched annotations. A rule typically specifies the condition that must match, an optional quantifier, a list of additional constraints that the matched text must fulfill, and the actions to be taken when a match occurs, such as creating a new annotation or modifying an existing annotation.

lemma

The normalized or canonical form of a word. Typically, the lemma is the underived and uninflected form of a noun or a verb. For example, the lemma of the terms 'organizing' and 'organized' is 'organize'. See also dictionaryand surface form.

dictionary

A collection of words that can be used to pre-annotate documents. A new annotation is created for each word in the document text that matches a term in the dictionary. A machine learning model can be configured with one or more independent dictionaries, which are typically domain-specific, such a dictionary for pharmaceuticals and a dictionary for wealth management. See also lemma and surface form.

[辞書]には、ドキュメントで使用される単語の見出し語とその類義語、品詞が含まれています。

dictionary pre-annotator

A component that identifies mentions in text that match a specific set of words. By using domain-specific terminology to pre-annotate text, dictionary pre-annotators can accelerate a human annotator's ability to prepare a set of ground truth documents.

[辞書ベースアノテーション]：辞書とドキュメントを照らし合わせ、辞書にある単語がドキュメント内にもある場合、その辞書に対応するentity typeをその単語に割り当てる方法です。ドキュメントが固有名詞や特殊な用語を持つ場合に有効ですが、逆に多義語の分類を正しくできないことがあります（はし（橋）とはし（端）など）。また、辞書ベースアノテーションではrelationをアノテーションしないため、代わりにヒューマン・アノテーションにより手動で補う必要があります。

document set

A collection of documents. Documents that are imported together become a document set. Annotated documents that are grouped together for training purposes (Test, Train, Blind) are generated as document sets.

machine learning model/machine learning annotator

A component that identifies entities and entity relationships according to a statistical model that is based on ground truth. The model applies past experience, such as training data, to determine or predict the correct outcome of future experiences based on characteristics of the data. These past experiences are captured in the form of a model by calculating feature scores for each candidate answer or evidence and combining that with known outcomes. Sometimes referred to as machine learning annotator.

[機械学習アノテーション]：既存のドキュメントのみをアノテーションした後、機械学習モデルを作成してアノテーションの結果を学習させ、その学習モデルを使用して後から追加した未知の新しい文書に自動的にアノテーションする方法です。学習に使用されたドキュメントと新しいドキュメントが似ている場合に有効です

Class

클래스에는 딕셔너리나 정규표현식, 률의 규칙이 포함된다. 룰은 딕셔너리와 정규표현식의 조합을 포함하여 규정하는 더큰 방식의 률베이스 모델이다

저작자표시 비영리 변경금지

'C Lang > New IT Program Diary' 카테고리의 다른 글

IBM WEX analystic studio (0)	2018.09.11
Watson Knowledge StudioのTutorialをやってみた(딕셔너리베이스 pre-annotation, human annotator를 위한 annotation task작성, annotator간 합의도출, 기계학습annotator작성) (0)	2018.07.31
Watson Knowledge StudioのTutorialをやってみた(annotation, type system, entity type, relation type, 도큐먼트 추가, 도큐먼트작성, 딕셔너리추가) (0)	2018.07.31
dialogflow program diary (0)	2018.07.10
Watson program diary (0)	2018.05.10

가치관제작소

자연언어 머신러닝 관련 용어

'C Lang > New IT Program Diary' 카테고리의 다른 글

+ Recent posts

티스토리툴바