Support text format in classification model and clustering model

Registered by Hiroyuki Eguchi

Currently, these models support only non-text format.
This bp aims to support text format using tf–idf. [1]

A tf–idf is a text-mining technology which parse documents to index by numerical statistic.

This feature allows user to create following prediction models.

- model detects whether it is a spam mail or not
- model predicts whether it is a review of goodwill or not
- model detects what language a document is written in

[1] https://spark.apache.org/docs/1.6.3/mllib-feature-extraction.html#tf-idf

Blueprint information

Status:
Not started
Approver:
None
Priority:
Medium
Drafter:
Hiroyuki Eguchi
Direction:
Needs approval
Assignee:
Hiroyuki Eguchi
Definition:
New
Series goal:
None
Implementation:
Unknown
Milestone target:
None

Related branches

Sprints

Whiteboard

Gerrit topic: https://review.openstack.org/#q,topic:bp/support-text-format,n,z

Addressed by: https://review.openstack.org/445699
    Enable NaiveBayes to support a text format

Addressed by: https://review.openstack.org/448419
    Enable KMeans to support a text format

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.