Drizzle

json server

Registered by Mohit Srivastava on 2012-03-30

Top level blueprint for json_server, 2012. We will break out sub-blueprints to do more detailed designs as we go along.

Blueprint information

Status:: Started

Approver:: None

Priority:: Medium

Drafter:: Mohit Srivastava

Direction:: Needs approval

Assignee:: Mohit Srivastava

Definition:: New

Series goal:: Proposed for 7.2

Implementation:: Started

Milestone target:: future

Started by: Mohit Srivastava on 2012-03-30

Related branches

lp:~hingo/drizzle/drizzle-json_server-keyvalue

Related bugs

Sprints

Whiteboard

Version 0.3 =>usability, refactoring and multithreading

DONE

Tests for /json api

The Json parser likes to throw exceptions. This will crash entire drizzle server. So we need to wrap all this code into try/catch blocks. There are also segmentation faults in cases like GET http://localhost:8086/json

...it probably makes sense to make this code object oriented and probably also break into multiple files.

Currently the http library is *not* multi-threaded. We need to create a pool of processing threads to which the libevent http server hands over each request.

The /version API should also return the version of json_server. Proposed key name is json_server_version.

Small usability things like allowing to set default schema and table name as drizzled options.

NEXT SPRINT

Re-factor the existing code using storage engine API.

Currently we return content-type text/html which is wrong.

We should also be more correct with http response codes: http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html

The return json object should also contain:
last_insert_id (for inserts)

BACKLOG

The code that is now in SQLGenerator doesn't escape strings that go into SQL, nor does it backtick-escape column names. This can lead to both errors and exploits.

The demo GUI returned from root should be broken out to its own html file. The current approach where it is strings in C++ code is terrible to maintain. A followup to this would be to implement a simple webserver that could also serve any other files, for instance under http://localhost:8086/files/*

The return json object should also contain (winter project?)
affected rows (for all queries)

Perhaps also add authentication at this point. Or later. (I'm thorn between doing http authentication or just tossing a username and password into the json query structure.)

Version 0.4 (and beyond, don't know what would be each specific version)

Range scans for key:
ability to specify a minimum and maximum value for _id. For this we also need to specify how to express that in json.
In current version you can do { "_id" : 1 } but json doesn't allow { "_id" > 1 AND "_id" < 10 } so we have to invent some syntax there.

SELECT and UPDATE on secondary key:
GET { "document" : { "person" : {"firstname" : "Henrik" } } } will return this record: { "_id" : 1,"document" : { "person" : { "firstname": "Henrik", "lastname" :"Ingo" } } }
To achieve that we will utilize the JS plugin in the select queries.
See http://docs.drizzle.org/plugins/js/index.html

Oh, btw the JS plugin is still single threaded too.
Might be worth fixing before going into next step:

After that, arbitrary combinations for the query parameters. For instance, give a range of _id values, then some other key like"firstname" and all of this goes into the WHERE of the SQL.

Now, all this looks nice except that _id is the only thing that is indexed. It would be nice to also have something that resembles secondary indexes, but what are we going to do when our data is just some json in a TEXT field?
Plan here is to:
* we need to implement something like stored procedures, but building on the JS plugin.
* after that it's easy to implement something like triggers
* now, add some json/http command where user can tell Drizzle that a secondary index should be maintained for some key inside the json document.
* Create a helper table - this is our index
* Using our JavaScript triggers, whenever an inserted or updated json document matches the key given by user, insert the key+value and the corresponding _id value into the index table.
* When receiving a query that is using the "secondary index", we will actually query the helper table and then join by _id field to the actual json table..

At this point we would really have covered pretty much everything that is possible to do with single table, single statement operations. I don't see that multi-statement transactions are necessarily what you want to start doing over http anyway, so it might be the end of the road. We would now have pretty much feature parity with MongoDB and CouchDB.

As a side effect we would also have created JavaScript based (quasi) stored procedures and triggers into Drizzle. We don't need it, but to make these procedures generally useful, one would also want to bind the Execute API to be available from within JavaScript, so that you could create JavaScript procedures that do something like
res = db.execute("SELECT * FROM mytable");
// now iterate over res and do something...

and maybe a "CALL myprocedure();" syntax into Drizzle parser to conveniently call these procedures.

(?)

Work Items

Dependency tree

* Blueprints in grey have been implemented.

This blueprint contains Public information

Everyone can see this information.

Subscribers

Henrik Ingo

Mohit Srivastava

Stewart Smith