A new proposal for indexing with labels

Registered by kwgoodman

In a blueprint titled "index-by-label" I proposed a way to index larrys by lists of label elements. Here's a simpler, but less versatile, proposal. On the whole, due to its simplicity, I think it is more powerful.

You can index into larrys just like you index into numpy arrays. To index into numpy arrays you can use integer, slices, etc. But you can't use strings. Strings have no meaning in the context of indexing. Therefore we are free to assign a special meaning to strings when used for indexing into a larry.

My proposal is to interpret strings as label elements. So for example:

>> y = la.larry([1,2,3], [['a', 'b', 4]])

>> y['a']
   1

>> y['b':] # <---- slick!
label_0
    b
    4
x
array([2, 3])

>> y['4']
   3

Note the last example above. We indexed with the string '4'. But there is no string '4' in the label, there is only the integer 4. The algorithm first looks for a string '4' in the label; if not found, then it maps the label to strings and looks again.

I think it is quite powerful. It does add some overhead to non-string indexing, but not much. The biggest overhead is checking if slice objects have strings in them. For indexing with one integer (y[5]), for example, there is no overhead.

Here are some more examples:

>> from la import larry
>> import numpy as np
>> import datetime
>> d = datetime.date
>>
>> x = np.arange(24).reshape(2,3,4)
>> label = [['price', 'volume'], ['aapl', 'ibm', 'dell'], [d(2009,1,1), d(2009,1,2), d(2009,1,3), d(2009,1,4)]]
>> y = larry(x, label)

>> y['price']
label_0
    aapl
    ibm
    dell
label_1
    2009-01-01
    2009-01-02
    2009-01-03
    2009-01-04
x
array([[ 0, 1, 2, 3],
       [ 4, 5, 6, 7],
       [ 8, 9, 10, 11]])

>> y['price', 'aapl']
label_0
    2009-01-01
    2009-01-02
    2009-01-03
    2009-01-04
x
array([0, 1, 2, 3])

>> y['price', 'aapl':]
label_0
    aapl
    ibm
    dell
label_1
    2009-01-01
    2009-01-02
    2009-01-03
    2009-01-04
x
array([[ 0, 1, 2, 3],
       [ 4, 5, 6, 7],
       [ 8, 9, 10, 11]])

>> y['price', 'aapl', '2009-01-02']
   1

>> y['price', 'dell', '2009-01-02']
   9

>> y[:, 'dell', :]
label_0
    price
    volume
label_1
    2009-01-01
    2009-01-02
    2009-01-03
    2009-01-04
x
array([[ 8, 9, 10, 11],
       [20, 21, 22, 23]])

>> y[0, 'ibm', 2]
   6

>> y[0, 'ibm', :]

label_0
    2009-01-01
    2009-01-02
    2009-01-03
    2009-01-04
x
array([4, 5, 6, 7])

Blueprint information

Status:
Not started
Approver:
None
Priority:
Undefined
Drafter:
None
Direction:
Needs approval
Assignee:
None
Definition:
New
Series goal:
None
Implementation:
Unknown
Milestone target:
None

Related branches

Sprints

Whiteboard

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.