Merge lp:~bac/charmworld/ngram-inquiry into lp:charmworld

Proposed by Brad Crittenden
Status: Merged
Approved by: Brad Crittenden
Approved revision: 516
Merged at revision: 516
Proposed branch: lp:~bac/charmworld/ngram-inquiry
Merge into: lp:charmworld
Diff against target: 144 lines (+72/-9)
2 files modified
charmworld/search.py (+6/-7)
charmworld/tests/test_search.py (+66/-2)
To merge this branch: bzr merge lp:~bac/charmworld/ngram-inquiry
Reviewer Review Type Date Requested Status
Juju Gui Bot continuous-integration Approve
Madison Scott-Clary (community) Approve
Review via email: mp+226910@code.launchpad.net

Commit message

Make ngram searches work.

After the ngram search was deployed it was discovered to only work if 1) using
the API and 2) had autocomplete=true.

Inspecting the code showed the intent for ngram to be used in other
situations, and there was not documented rationale for only using it when
doing autocomplete.

The problem was simply in the query for non-autocomplete the field for the
ngrams was prefixed with 'data.' when it should not have been.

The ngrams fields were moved out of the charm and bundle fields and passed as
separate fields, treated like 'provides' and 'requires', to avoid the
prefixing.

As a result of this change, the search on the manage.jujucharms.com page now
uses ngrams too.

https://codereview.appspot.com/117810044/

R=makyo

Description of the change

Make ngram searches work.

After the ngram search was deployed it was discovered to only work if 1) using
the API and 2) had autocomplete=true.

Inspecting the code showed the intent for ngram to be used in other
situations, and there was not documented rationale for only using it when
doing autocomplete.

The problem was simply in the query for non-autocomplete the field for the
ngrams was prefixed with 'data.' when it should not have been.

The ngrams fields were moved out of the charm and bundle fields and passed as
separate fields, treated like 'provides' and 'requires', to avoid the
prefixing.

As a result of this change, the search on the manage.jujucharms.com page now
uses ngrams too.

QA:

First, ingest some data:
# ttyn
$ make run

# ttyn+1
$ bin/es-update
$ bin/ingest-queued --prefix=~cabs-team/ ## gets sugarcrm
$ bin/ingest-queued --prefix=~charmers/charms/ ## gets a bunch of charms
$ bin/ingest-queued --prefix=~bac ## gets a bundle

Without autocomplete an ngram search is done in addition to the other
fields. This search will find items with 'popular' in the description.
http://127.0.0.1:2464/api/3/search?text=popular

But turning on autocomplete limits to the ngram of the item name. Nothing
found.
http://127.0.0.1:2464/api/3/search?text=popular&autocomplete=true

These two should find items with names and free text, followed by just names.
http://127.0.0.1:2464/api/3/search?text=sql
http://127.0.0.1:2464/api/3/search?text=sql&autocomplete=true

This is the entry point used by the search box on the web page.
http://127.0.0.1:2464/api/3/search?text=sql

https://codereview.appspot.com/117810044/

To post a comment you must log in.
Revision history for this message
Brad Crittenden (bac) wrote :
Download full text (7.9 KiB)

Reviewers: mp+226910_code.launchpad.net,

Message:
Please take a look.

Description:
Make ngram searches work.

After the ngram search was deployed it was discovered to only work if 1)
using
the API and 2) had autocomplete=true.

Inspecting the code showed the intent for ngram to be used in other
situations, and there was not documented rationale for only using it
when
doing autocomplete.

The problem was simply in the query for non-autocomplete the field for
the
ngrams was prefixed with 'data.' when it should not have been.

The ngrams fields were moved out of the charm and bundle fields and
passed as
separate fields, treated like 'provides' and 'requires', to avoid the
prefixing.

As a result of this change, the search on the manage.jujucharms.com page
now
uses ngrams too.

QA:

First, ingest some data:
# ttyn
$ make run

# ttyn+1
$ bin/es-update
$ bin/ingest-queued --prefix=~cabs-team/ ## gets sugarcrm
$ bin/ingest-queued --prefix=~charmers/charms/ ## gets a bunch of charms
$ bin/ingest-queued --prefix=~bac ## gets a bundle

Without autocomplete an ngram search is done in addition to the other
fields. This search will find items with 'popular' in the description.
http://127.0.0.1:2464/api/3/search?text=popular

But turning on autocomplete limits to the ngram of the item name.
Nothing
found.
http://127.0.0.1:2464/api/3/search?text=popular&autocomplete=true

These two should find items with names and free text, followed by just
names.
http://127.0.0.1:2464/api/3/search?text=sql
http://127.0.0.1:2464/api/3/search?text=sql&autocomplete=true

This is the entry point used by the search box on the web page.
http://127.0.0.1:2464/api/3/search?text=sql

https://code.launchpad.net/~bac/charmworld/ngram-inquiry/+merge/226910

(do not edit description out of merge proposal)

Please review this at https://codereview.appspot.com/117810044/

Affected files (+74, -9 lines):
   A [revision details]
   M charmworld/search.py
   M charmworld/tests/test_search.py

Index: [revision details]
=== added file '[revision details]'
--- [revision details] 2012-01-01 00:00:00 +0000
+++ [revision details] 2012-01-01 00:00:00 +0000
@@ -0,0 +1,2 @@
+Old revision: tarmac-20140521163857-dgt0onkdb3begqkj
+New revision: <email address hidden>

Index: charmworld/search.py
=== modified file 'charmworld/search.py'
--- charmworld/search.py 2014-05-19 23:44:29 +0000
+++ charmworld/search.py 2014-07-15 19:09:59 +0000
@@ -32,7 +32,6 @@
  ]
  charm_free_text_fields = {
      'name': 10,
- 'ngrams': None,
      'summary': 5,
      'description': 3,
      'config.options.description': None,
@@ -46,7 +45,6 @@
  ]
  bundle_free_text_fields = {
      'name': 10,
- 'ngrams': None,
      'basket_name': 5,
      'description': 3,
      'title': None,
@@ -353,7 +351,7 @@
                  "fields": {
                      "ngrams": {
                          "type": "string",
- "analyzer": "n3_20grams"
+ "analyzer": "n3_20grams",
                      },
                      "name": {
                          "type": "string",
@@ -490,12 +488,13 @@

              charm_dsl = {'filtered': {
                  ...

Read more...

Revision history for this message
Madison Scott-Clary (makyo) wrote :

Code LGTM and QA okay, thanks for the fix.

review: Approve
Revision history for this message
Juju Gui Bot (juju-gui-bot) wrote :

FAILED: Autolanding.
No commit message was specified in the merge proposal. Hit 'Add commit message' on the merge proposal web page or follow the link below. You can approve the merge proposal yourself to rerun.
https://code.launchpad.net/~bac/charmworld/ngram-inquiry/+merge/226910/+edit-commit-message

review: Needs Fixing (continuous-integration)
Revision history for this message
Juju Gui Bot (juju-gui-bot) :
review: Approve (continuous-integration)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'charmworld/search.py'
--- charmworld/search.py 2014-05-19 23:44:29 +0000
+++ charmworld/search.py 2014-07-15 19:51:00 +0000
@@ -32,7 +32,6 @@
32]32]
33charm_free_text_fields = {33charm_free_text_fields = {
34 'name': 10,34 'name': 10,
35 'ngrams': None,
36 'summary': 5,35 'summary': 5,
37 'description': 3,36 'description': 3,
38 'config.options.description': None,37 'config.options.description': None,
@@ -46,7 +45,6 @@
46]45]
47bundle_free_text_fields = {46bundle_free_text_fields = {
48 'name': 10,47 'name': 10,
49 'ngrams': None,
50 'basket_name': 5,48 'basket_name': 5,
51 'description': 3,49 'description': 3,
52 'title': None,50 'title': None,
@@ -353,7 +351,7 @@
353 "fields": {351 "fields": {
354 "ngrams": {352 "ngrams": {
355 "type": "string",353 "type": "string",
356 "analyzer": "n3_20grams"354 "analyzer": "n3_20grams",
357 },355 },
358 "name": {356 "name": {
359 "type": "string",357 "type": "string",
@@ -490,12 +488,13 @@
490488
491 charm_dsl = {'filtered': {489 charm_dsl = {'filtered': {
492 'query': make_query(490 'query': make_query(
493 charm_fields, ['requires.*', 'provides.*']),491 charm_fields, ['ngrams', 'requires.*', 'provides.*']),
494 'filter': {'type': {'value': CHARM}}492 'filter': {'type': {'value': CHARM}},
495 }}493 }}
496 bundle_dsl = {'filtered': {494 bundle_dsl = {'filtered': {
497 'query': make_query(bundle_fields),495 'query': make_query(
498 'filter': {'type': {'value': BUNDLE}}496 bundle_fields, ['ngrams']),
497 'filter': {'type': {'value': BUNDLE}},
499 }}498 }}
500 # Union the bundle_dsl and charm_dsl results499 # Union the bundle_dsl and charm_dsl results
501 dsl = {'bool': {'should': [bundle_dsl, charm_dsl]}}500 dsl = {'bool': {'should': [bundle_dsl, charm_dsl]}}
502501
=== modified file 'charmworld/tests/test_search.py'
--- charmworld/tests/test_search.py 2014-05-14 17:59:37 +0000
+++ charmworld/tests/test_search.py 2014-07-15 19:51:00 +0000
@@ -187,13 +187,33 @@
187 def test_ngrams_no_matching_ngrams(self):187 def test_ngrams_no_matching_ngrams(self):
188 client = self.index_client188 client = self.index_client
189 test_value = "blarglefoafblatherinfinatum"189 test_value = "blarglefoafblatherinfinatum"
190 foo_charm = Charm(self.makeCharm(name=test_value))190 Charm(self.makeCharm(name=test_value))
191 foo_charm = foo_charm # shut it pyflakes
192 query = {"query": {"match":191 query = {"query": {"match":
193 {"ngrams": "blvlsdkjfa;lsdkghoifdhvsoidufhvsdececewv"}}}192 {"ngrams": "blvlsdkjfa;lsdkghoifdhvsoidufhvsdececewv"}}}
194 results = client._client.search(query, index=client.index_name)193 results = client._client.search(query, index=client.index_name)
195 self.assertEquals(results['hits']['total'], 0)194 self.assertEquals(results['hits']['total'], 0)
196195
196 def test_search_charm_ngrams_found(self):
197 client = self.index_client
198 test_value = "blarglefoafblatherinfinatum"
199 Charm(self.makeCharm(name=test_value))
200 results = client.search('foaf')['results']
201 self.assertEqual(1, len(results[CHARM]))
202
203 def test_search_charm_no_matching_ngrams(self):
204 client = self.index_client
205 test_value = "blarglefoafblatherinfinatum"
206 Charm(self.makeCharm(name=test_value))
207 results = client.search('schmoo')['results']
208 self.assertEqual(0, len(results[CHARM]))
209
210 def test_search_charm_ngrams_do_not_match_description(self):
211 client = self.index_client
212 name = "blarglefoafblatherinfinatum"
213 Charm(self.makeCharm(name=name, description='supermoon'))
214 results = client.search('moon')['results']
215 self.assertEqual(0, len(results[CHARM]))
216
197 def test_search_bundle(self):217 def test_search_bundle(self):
198 bundle = Bundle(self.makeBundle())218 bundle = Bundle(self.makeBundle())
199 client = self.index_client219 client = self.index_client
@@ -325,6 +345,27 @@
325 self.assertEqual(2, len(results['bundle']))345 self.assertEqual(2, len(results['bundle']))
326 self.assertEqual(0, len(results['charm']))346 self.assertEqual(0, len(results['charm']))
327347
348 def test_search_bundle_ngrams_found(self):
349 client = self.index_client
350 test_value = "blarglefoafblatherinfinatum"
351 Bundle(self.makeBundle(name=test_value))
352 results = client.search('foaf')['results']
353 self.assertEqual(1, len(results[BUNDLE]))
354
355 def test_search_bundle_no_matching_ngrams(self):
356 client = self.index_client
357 test_value = "blarglefoafblatherinfinatum"
358 Bundle(self.makeBundle(name=test_value))
359 results = client.search('schmoo')['results']
360 self.assertEqual(0, len(results[BUNDLE]))
361
362 def test_search_bundle_ngrams_do_not_match_description(self):
363 client = self.index_client
364 name = "blarglefoafblatherinfinatum"
365 Bundle(self.makeBundle(name=name, description='supermoon'))
366 results = client.search('moon')['results']
367 self.assertEqual(0, len(results[BUNDLE]))
368
328 def test_searching_for_charms_only(self):369 def test_searching_for_charms_only(self):
329 charm = Charm(self.makeCharm(description='a mozilla charm'))370 charm = Charm(self.makeCharm(description='a mozilla charm'))
330 Bundle(self.makeBundle(description='a mozilla bundle'))371 Bundle(self.makeBundle(description='a mozilla bundle'))
@@ -577,6 +618,29 @@
577 'bla', autocomplete=True, min_score=0)618 'bla', autocomplete=True, min_score=0)
578 self.assertEquals(result, [])619 self.assertEquals(result, [])
579620
621 def test_ngram_match(self):
622 # Without autocomplete.
623 charm1 = self.makeCharm(name='foo')
624 charm2 = self.makeCharm(name='foobar')
625 charm3 = self.makeCharm(name='barfoo')
626 result = self.index_client.api_search(
627 'foo', autocomplete=False, min_score=0)
628 ids = [r['data']['_id'] for r in result]
629 self.assertItemsEqual([
630 charm1['_id'],
631 charm2['_id'],
632 charm3['_id']],
633 ids)
634
635 def test_ngram_miss(self):
636 # Without autocomplete.
637 self.makeCharm(name='foo')
638 self.makeCharm(name='foobar')
639 self.makeCharm(name='barfoo')
640 result = self.index_client.api_search(
641 'bla', autocomplete=False, min_score=0)
642 self.assertEquals(result, [])
643
580 def test_special_characters_return_no_results(self):644 def test_special_characters_return_no_results(self):
581 self.makeCharm(name='foo')645 self.makeCharm(name='foo')
582 result = self.index_client.api_search(646 result = self.index_client.api_search(

Subscribers

People subscribed via source and target branches

to all changes: