Meteos

Support text format in classification model and clustering model

Registered by Hiroyuki Eguchi on 2017-03-09

Currently, these models support only non-text format.
This bp aims to support text format using tf–idf. [1]

A tf–idf is a text-mining technology which parse documents to index by numerical statistic.

This feature allows user to create following prediction models.

- model detects whether it is a spam mail or not
- model predicts whether it is a review of goodwill or not
- model detects what language a document is written in

[1] https://spark.apache.org/docs/1.6.3/mllib-feature-extraction.html#tf-idf

Blueprint information

Status:: Not started

Approver:: None

Priority:: Medium

Drafter:: Hiroyuki Eguchi

Direction:: Needs approval

Assignee:: Hiroyuki Eguchi

Definition:: New

Series goal:: None

Implementation:: Unknown

Milestone target:: None

Related branches

Related bugs

Sprints

Whiteboard

Gerrit topic: https://review.openstack.org/#q,topic:bp/support-text-format,n,z

Addressed by: https://review.openstack.org/445699
Enable NaiveBayes to support a text format

Addressed by: https://review.openstack.org/448419
Enable KMeans to support a text format

(?)

Work Items

This blueprint contains Public information

Everyone can see this information.

Subscribers

No subscribers.