Merge lp:~bac/charmworld/ngram-inquiry into lp:charmworld

Proposed by Brad Crittenden
Status: Merged
Approved by: Brad Crittenden
Approved revision: 516
Merged at revision: 516
Proposed branch: lp:~bac/charmworld/ngram-inquiry
Merge into: lp:charmworld
Diff against target: 144 lines (+72/-9)
2 files modified
charmworld/search.py (+6/-7)
charmworld/tests/test_search.py (+66/-2)
To merge this branch: bzr merge lp:~bac/charmworld/ngram-inquiry
Reviewer Review Type Date Requested Status
Juju Gui Bot continuous-integration Approve
Madison Scott-Clary (community) Approve
Review via email: mp+226910@code.launchpad.net

Commit message

Make ngram searches work.

After the ngram search was deployed it was discovered to only work if 1) using
the API and 2) had autocomplete=true.

Inspecting the code showed the intent for ngram to be used in other
situations, and there was not documented rationale for only using it when
doing autocomplete.

The problem was simply in the query for non-autocomplete the field for the
ngrams was prefixed with 'data.' when it should not have been.

The ngrams fields were moved out of the charm and bundle fields and passed as
separate fields, treated like 'provides' and 'requires', to avoid the
prefixing.

As a result of this change, the search on the manage.jujucharms.com page now
uses ngrams too.

https://codereview.appspot.com/117810044/

R=makyo

Description of the change

Make ngram searches work.

After the ngram search was deployed it was discovered to only work if 1) using
the API and 2) had autocomplete=true.

Inspecting the code showed the intent for ngram to be used in other
situations, and there was not documented rationale for only using it when
doing autocomplete.

The problem was simply in the query for non-autocomplete the field for the
ngrams was prefixed with 'data.' when it should not have been.

The ngrams fields were moved out of the charm and bundle fields and passed as
separate fields, treated like 'provides' and 'requires', to avoid the
prefixing.

As a result of this change, the search on the manage.jujucharms.com page now
uses ngrams too.

QA:

First, ingest some data:
# ttyn
$ make run

# ttyn+1
$ bin/es-update
$ bin/ingest-queued --prefix=~cabs-team/ ## gets sugarcrm
$ bin/ingest-queued --prefix=~charmers/charms/ ## gets a bunch of charms
$ bin/ingest-queued --prefix=~bac ## gets a bundle

Without autocomplete an ngram search is done in addition to the other
fields. This search will find items with 'popular' in the description.
http://127.0.0.1:2464/api/3/search?text=popular

But turning on autocomplete limits to the ngram of the item name. Nothing
found.
http://127.0.0.1:2464/api/3/search?text=popular&autocomplete=true

These two should find items with names and free text, followed by just names.
http://127.0.0.1:2464/api/3/search?text=sql
http://127.0.0.1:2464/api/3/search?text=sql&autocomplete=true

This is the entry point used by the search box on the web page.
http://127.0.0.1:2464/api/3/search?text=sql

https://codereview.appspot.com/117810044/

To post a comment you must log in.
Revision history for this message
Brad Crittenden (bac) wrote :
Download full text (7.9 KiB)

Reviewers: mp+226910_code.launchpad.net,

Message:
Please take a look.

Description:
Make ngram searches work.

After the ngram search was deployed it was discovered to only work if 1)
using
the API and 2) had autocomplete=true.

Inspecting the code showed the intent for ngram to be used in other
situations, and there was not documented rationale for only using it
when
doing autocomplete.

The problem was simply in the query for non-autocomplete the field for
the
ngrams was prefixed with 'data.' when it should not have been.

The ngrams fields were moved out of the charm and bundle fields and
passed as
separate fields, treated like 'provides' and 'requires', to avoid the
prefixing.

As a result of this change, the search on the manage.jujucharms.com page
now
uses ngrams too.

QA:

First, ingest some data:
# ttyn
$ make run

# ttyn+1
$ bin/es-update
$ bin/ingest-queued --prefix=~cabs-team/ ## gets sugarcrm
$ bin/ingest-queued --prefix=~charmers/charms/ ## gets a bunch of charms
$ bin/ingest-queued --prefix=~bac ## gets a bundle

Without autocomplete an ngram search is done in addition to the other
fields. This search will find items with 'popular' in the description.
http://127.0.0.1:2464/api/3/search?text=popular

But turning on autocomplete limits to the ngram of the item name.
Nothing
found.
http://127.0.0.1:2464/api/3/search?text=popular&autocomplete=true

These two should find items with names and free text, followed by just
names.
http://127.0.0.1:2464/api/3/search?text=sql
http://127.0.0.1:2464/api/3/search?text=sql&autocomplete=true

This is the entry point used by the search box on the web page.
http://127.0.0.1:2464/api/3/search?text=sql

https://code.launchpad.net/~bac/charmworld/ngram-inquiry/+merge/226910

(do not edit description out of merge proposal)

Please review this at https://codereview.appspot.com/117810044/

Affected files (+74, -9 lines):
   A [revision details]
   M charmworld/search.py
   M charmworld/tests/test_search.py

Index: [revision details]
=== added file '[revision details]'
--- [revision details] 2012-01-01 00:00:00 +0000
+++ [revision details] 2012-01-01 00:00:00 +0000
@@ -0,0 +1,2 @@
+Old revision: tarmac-20140521163857-dgt0onkdb3begqkj
+New revision: <email address hidden>

Index: charmworld/search.py
=== modified file 'charmworld/search.py'
--- charmworld/search.py 2014-05-19 23:44:29 +0000
+++ charmworld/search.py 2014-07-15 19:09:59 +0000
@@ -32,7 +32,6 @@
  ]
  charm_free_text_fields = {
      'name': 10,
- 'ngrams': None,
      'summary': 5,
      'description': 3,
      'config.options.description': None,
@@ -46,7 +45,6 @@
  ]
  bundle_free_text_fields = {
      'name': 10,
- 'ngrams': None,
      'basket_name': 5,
      'description': 3,
      'title': None,
@@ -353,7 +351,7 @@
                  "fields": {
                      "ngrams": {
                          "type": "string",
- "analyzer": "n3_20grams"
+ "analyzer": "n3_20grams",
                      },
                      "name": {
                          "type": "string",
@@ -490,12 +488,13 @@

              charm_dsl = {'filtered': {
                  ...

Read more...

Revision history for this message
Madison Scott-Clary (makyo) wrote :

Code LGTM and QA okay, thanks for the fix.

review: Approve
Revision history for this message
Juju Gui Bot (juju-gui-bot) wrote :

FAILED: Autolanding.
No commit message was specified in the merge proposal. Hit 'Add commit message' on the merge proposal web page or follow the link below. You can approve the merge proposal yourself to rerun.
https://code.launchpad.net/~bac/charmworld/ngram-inquiry/+merge/226910/+edit-commit-message

review: Needs Fixing (continuous-integration)
Revision history for this message
Juju Gui Bot (juju-gui-bot) :
review: Approve (continuous-integration)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'charmworld/search.py'
2--- charmworld/search.py 2014-05-19 23:44:29 +0000
3+++ charmworld/search.py 2014-07-15 19:51:00 +0000
4@@ -32,7 +32,6 @@
5 ]
6 charm_free_text_fields = {
7 'name': 10,
8- 'ngrams': None,
9 'summary': 5,
10 'description': 3,
11 'config.options.description': None,
12@@ -46,7 +45,6 @@
13 ]
14 bundle_free_text_fields = {
15 'name': 10,
16- 'ngrams': None,
17 'basket_name': 5,
18 'description': 3,
19 'title': None,
20@@ -353,7 +351,7 @@
21 "fields": {
22 "ngrams": {
23 "type": "string",
24- "analyzer": "n3_20grams"
25+ "analyzer": "n3_20grams",
26 },
27 "name": {
28 "type": "string",
29@@ -490,12 +488,13 @@
30
31 charm_dsl = {'filtered': {
32 'query': make_query(
33- charm_fields, ['requires.*', 'provides.*']),
34- 'filter': {'type': {'value': CHARM}}
35+ charm_fields, ['ngrams', 'requires.*', 'provides.*']),
36+ 'filter': {'type': {'value': CHARM}},
37 }}
38 bundle_dsl = {'filtered': {
39- 'query': make_query(bundle_fields),
40- 'filter': {'type': {'value': BUNDLE}}
41+ 'query': make_query(
42+ bundle_fields, ['ngrams']),
43+ 'filter': {'type': {'value': BUNDLE}},
44 }}
45 # Union the bundle_dsl and charm_dsl results
46 dsl = {'bool': {'should': [bundle_dsl, charm_dsl]}}
47
48=== modified file 'charmworld/tests/test_search.py'
49--- charmworld/tests/test_search.py 2014-05-14 17:59:37 +0000
50+++ charmworld/tests/test_search.py 2014-07-15 19:51:00 +0000
51@@ -187,13 +187,33 @@
52 def test_ngrams_no_matching_ngrams(self):
53 client = self.index_client
54 test_value = "blarglefoafblatherinfinatum"
55- foo_charm = Charm(self.makeCharm(name=test_value))
56- foo_charm = foo_charm # shut it pyflakes
57+ Charm(self.makeCharm(name=test_value))
58 query = {"query": {"match":
59 {"ngrams": "blvlsdkjfa;lsdkghoifdhvsoidufhvsdececewv"}}}
60 results = client._client.search(query, index=client.index_name)
61 self.assertEquals(results['hits']['total'], 0)
62
63+ def test_search_charm_ngrams_found(self):
64+ client = self.index_client
65+ test_value = "blarglefoafblatherinfinatum"
66+ Charm(self.makeCharm(name=test_value))
67+ results = client.search('foaf')['results']
68+ self.assertEqual(1, len(results[CHARM]))
69+
70+ def test_search_charm_no_matching_ngrams(self):
71+ client = self.index_client
72+ test_value = "blarglefoafblatherinfinatum"
73+ Charm(self.makeCharm(name=test_value))
74+ results = client.search('schmoo')['results']
75+ self.assertEqual(0, len(results[CHARM]))
76+
77+ def test_search_charm_ngrams_do_not_match_description(self):
78+ client = self.index_client
79+ name = "blarglefoafblatherinfinatum"
80+ Charm(self.makeCharm(name=name, description='supermoon'))
81+ results = client.search('moon')['results']
82+ self.assertEqual(0, len(results[CHARM]))
83+
84 def test_search_bundle(self):
85 bundle = Bundle(self.makeBundle())
86 client = self.index_client
87@@ -325,6 +345,27 @@
88 self.assertEqual(2, len(results['bundle']))
89 self.assertEqual(0, len(results['charm']))
90
91+ def test_search_bundle_ngrams_found(self):
92+ client = self.index_client
93+ test_value = "blarglefoafblatherinfinatum"
94+ Bundle(self.makeBundle(name=test_value))
95+ results = client.search('foaf')['results']
96+ self.assertEqual(1, len(results[BUNDLE]))
97+
98+ def test_search_bundle_no_matching_ngrams(self):
99+ client = self.index_client
100+ test_value = "blarglefoafblatherinfinatum"
101+ Bundle(self.makeBundle(name=test_value))
102+ results = client.search('schmoo')['results']
103+ self.assertEqual(0, len(results[BUNDLE]))
104+
105+ def test_search_bundle_ngrams_do_not_match_description(self):
106+ client = self.index_client
107+ name = "blarglefoafblatherinfinatum"
108+ Bundle(self.makeBundle(name=name, description='supermoon'))
109+ results = client.search('moon')['results']
110+ self.assertEqual(0, len(results[BUNDLE]))
111+
112 def test_searching_for_charms_only(self):
113 charm = Charm(self.makeCharm(description='a mozilla charm'))
114 Bundle(self.makeBundle(description='a mozilla bundle'))
115@@ -577,6 +618,29 @@
116 'bla', autocomplete=True, min_score=0)
117 self.assertEquals(result, [])
118
119+ def test_ngram_match(self):
120+ # Without autocomplete.
121+ charm1 = self.makeCharm(name='foo')
122+ charm2 = self.makeCharm(name='foobar')
123+ charm3 = self.makeCharm(name='barfoo')
124+ result = self.index_client.api_search(
125+ 'foo', autocomplete=False, min_score=0)
126+ ids = [r['data']['_id'] for r in result]
127+ self.assertItemsEqual([
128+ charm1['_id'],
129+ charm2['_id'],
130+ charm3['_id']],
131+ ids)
132+
133+ def test_ngram_miss(self):
134+ # Without autocomplete.
135+ self.makeCharm(name='foo')
136+ self.makeCharm(name='foobar')
137+ self.makeCharm(name='barfoo')
138+ result = self.index_client.api_search(
139+ 'bla', autocomplete=False, min_score=0)
140+ self.assertEquals(result, [])
141+
142 def test_special_characters_return_no_results(self):
143 self.makeCharm(name='foo')
144 result = self.index_client.api_search(

Subscribers

People subscribed via source and target branches

to all changes: