[WIP] Linear chain CRF example by vene · Pull Request #4 · pystruct/pystruct

vene · 2013-06-27T10:50:48Z

Rebased this PR to contain just the linear CRF example, cleaned up. It's visibly slower than it should be for such small data (even with inference='unary'), now we can use it to check why.

amueller · 2013-06-27T11:19:41Z

great! I wanted to have an example like that for a long time!
Maybe it is worth reimplementing the stuff from GraphCRF for the special case of chains, on the other hand it is nice that this required very little code :)

amueller · 2013-06-27T11:23:12Z

One test I usually do is that the energy of the configuration (returned by the inference or by inference.compute_energy) is the same as np.dot(w, psi(x, y)). This ensures that the psi implementation is compatible with the inference implementation.
It is also good to test that the energy returned by loss-augmented inference is equal to np.dot(w, psi(x, y)) + loss.
That ensures that loss-augmented inference does the right thing.
Sometimes I also add a test where I set the weights by hand and check that the result is what I expected it to be.

amueller · 2013-06-27T11:25:02Z

On the other hand, as you inherited everything from GraphCRF you should be fine.
Do you even need the reshape code btw?

vene · 2013-06-27T11:37:56Z

I don't remember exactly if I need it, it might need some refactoring in
the graphcrf to accept flat structures.

Something that I wanted to comment on, though it's a bit off-topic here:
It's annoying that you need to know n_features when building a model. I
did some model selection using sklearn's GridSearchCV and had to implement
a hacky wrapper around the learner to reinitialize the model before fitting
with n_features=X[0].shape[1]. Did you run into this?

On Thu, Jun 27, 2013 at 2:25 PM, Andreas Mueller
[email protected]:

On the other hand, as you inherited everything from GraphCRF you should
be fine.
Do you even need the reshape code btw?

—
Reply to this email directly or view it on GitHubhttps://github.com/amueller/pystruct/pull/4#issuecomment-20112155
.

amueller · 2013-06-27T11:43:13Z

Yes, this is annoying.
I wanted the learner to be independent of the data representation, though, so it doesn't know about "n_features".
But it needs to know how large the weight vector is to allocate. On the other hand the model needs to know how many feature it has to compute size_psi.
If you have a nice way around this, I'm all ears.
It would be possible to add an additional function to the models initialize or get_size_psi that the learner can call to initialize the model using the data.

amueller · 2013-06-27T11:44:13Z

There are certain times when I am doubtful that having the learner own the model was a good idea. Maybe it should have been the other way around ^^

amueller · 2013-06-27T11:46:03Z

But doing it the other way round, the GridSearchCV would also be kind of awkward.

vene · 2013-06-27T11:48:05Z

Maybe it's the CV that needs to be aware and initialize the model everytime?

BTW I got sparse support for the chains with 2 lines of code, but I need to
measure the memory use and check that it's not all rendered moot by dense *
sparse products. Commits coming tonight hopefully.

On Thu, Jun 27, 2013 at 2:46 PM, Andreas Mueller
[email protected]:

But doing it the other way round, the GridSearchCV would also be kind of
awkward.

—
Reply to this email directly or view it on GitHubhttps://github.com/amueller/pystruct/pull/4#issuecomment-20112979
.

amueller · 2013-06-27T11:50:37Z

sweet. should I merge the rest in the meantime? Could you remove the commented out code? And is there a way to redo the example as a plot_ example? That would be most delightful ;)

amueller · 2013-06-27T11:58:59Z

I don't remember exactly if I need it, it might need some refactoring in the graphcrf to accept flat structures.

GraphCRF only accepts flat structure, right? Or maybe we mean different things?

amueller · 2013-06-27T12:01:04Z

Some explanation to the example would also be nice ;) And maybe also add that as a test?

amueller · 2013-06-27T12:19:31Z

pystruct/models/chain_crf.py

For a chain, the default inference should probably be belief propagation via LibDAI. Move-making doesn't make much sense.

amueller · 2013-08-10T15:24:12Z

I am about to merge #44. I'll leave this one open, though, as I didn't include the syllable example. Maybe you find the time to polish the example a bit. That would be awesome!

zaxtax · 2013-08-10T18:25:01Z

I'll try to polish it this weekend. Unless @vene is on it

amueller · 2013-08-11T10:57:10Z

Thanks @zaxtax that would be awesome. I might branch before that but I can always cherry-pick it later.

amueller · 2013-08-14T19:40:32Z

cool I'll give it a shot later / tomorrow :)

larsmans · 2013-08-14T20:34:27Z

Just wondering, what kind of results do you get on this? I ported your example to seqlearn and I only got ~43% accuracy with your features. I probably did something wrong :(

vene · 2013-08-14T21:00:15Z

With the example exactly as it is (so with the SubgradientSSVM) I get 72
test score (mind how I measure the score: number of instances where all
individual "zeros" were perfectly matched)

Could I have the code you used with seqlearn?

On Wed, Aug 14, 2013 at 11:34 PM, Lars Buitinck [email protected]:

Just wondering, what kind of results do you get on this? I ported your
example to seqlearn and I only got ~43% accuracy with your features. I
probably did something wrong :(

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/4#issuecomment-22665091
.

larsmans · 2013-08-14T21:32:27Z

mind how I measure the score: number of instances where all
individual "zeros" were perfectly matched

Are you sure? Your code says

score = np.mean([np.all((y_t == 0) == (y_p == 0))
                 for (y_t, y_p) in zip(y_test, y_pred)])
print("Test score: {:2.2f}".format(ssvm.score(X_test, y_test)))

Mine is here. It's messy.

vene · 2013-08-14T22:10:26Z

I'm pretty sure, but I agree my line is very dense and unclear.

Basically before scoring it turns the 0, 1, 2, ... 7 label set into {0, 1},
by turning 0 into 1 and everything else into 0.
Then for each sequence it calls np.all(y_true == y_pred) on this
transformed tagging (ie it doesn't matter if you mislabel 6 as 5 as long as
you get all the 0s right)
Then np.mean over this should essentially compute the overall word-level
accuracy.

I re-ran the example with a vanilla perceptron with 5 iterations
(everything else is pystruct default except for verbose=1):

iteration 0
avg loss: 0.357857 w: [ 0.  0.  0. ...,  0.  0.  0.]
effective learning rate: 1.000000
iteration 1
avg loss: 0.319921 w: [ 0.  0.  0. ...,  0.  0.  0.]
effective learning rate: 1.000000
iteration 2
avg loss: 0.311667 w: [ 0.  0.  0. ...,  0.  0.  0.]
effective learning rate: 1.000000
iteration 3
avg loss: 0.310476 w: [ 0.  0.  0. ...,  0.  0.  0.]
effective learning rate: 1.000000
iteration 4
avg loss: 0.297222 w: [ 0.  0.  0. ...,  0.  0.  0.]
effective learning rate: 1.000000
Test score: 0.68

So basically quite worse loss than seqlearn, strangely better end resuit
though (?)

The SVM gets 0.72, while (in some experiments I did a month ago) CRFsuite
with window size = 3 * , averaged perceptron and cross-validation for
max_iter gets 0.768. (same kind of score)

by window size I mean the feature function also includes c[-3], c[-3-2],
c[-3-2-1] and the symmetricals. At least I think this is what I coded it
like; it makes sense that it get 0.05 more accuracy.

On Thu, Aug 15, 2013 at 12:32 AM, Lars Buitinck [email protected]:

mind how I measure the score: number of instances where all
individual "zeros" were perfectly matched

Are you sure? Your code says

score = np.mean([np.all((y_t == 0) == (y_p == 0))
for (y_t, y_p) in zip(y_test, y_pred)])
print("Test score: {:2.2f}".format(ssvm.score(X_test, y_test)))

Mine is here https://gist.github.com/larsmans/6235887. It's messy.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/4#issuecomment-22668901
.

vene · 2013-08-14T22:19:48Z

Setting trans_features=True results in a crash though. I'll investigate tomorrow.

Iteration  1... python(70779,0x7fff74a6e180) malloc: *** error for object 0xbff0000000000000: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug

larsmans · 2013-08-15T06:48:54Z

What I mean is that you compute the custom score, but then print ssvm.score. What does that compute?

(trans_features is less stable than I hoped. I'm working on that today.)

vene · 2013-08-15T08:21:56Z

Oops, yeah. Sorry about that, now I get 46% too.

On Thu, Aug 15, 2013 at 9:48 AM, Lars Buitinck [email protected]:

What I mean is that you compute the custom score, but then print
ssvm.score. What does that compute?

(trans_features is less stable than I hoped. I'm working on that today.)

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/4#issuecomment-22687980
.

pietrodn · 2017-04-21T21:15:39Z

I manually applied this patch to pystruct and tested the performance with the sparse matrices support. RAM usage is decreased ~10x and the speed is increased ~6x. I needed this because FrankWolfeSSVM with ChainCRF was eating up gigabytes of RAM for certain inputs.

I am currently trying to classify bytes inside a binary executable files, so for each byte in the binary file I have an 1-hot vector of 256 elements to give as input to the ChainCRF. Without sparse matrices, the input size would be 256 times the actual size of the binary!

Thank you! Please consider merging this :)

amueller · 2017-08-02T21:01:50Z

@pietrodn would you mind picking this up and creating some examples and benchmarks?

amueller reviewed Jun 27, 2013
View reviewed changes

amueller mentioned this pull request Aug 4, 2013

Chain crf #44

Merged

larsmans mentioned this pull request Aug 7, 2013

rewrite _utils.count_trans in Cython larsmans/seqlearn#6

Closed

Syllable example

511d318

py3k, I think

a591b2d

pietrodn mentioned this pull request Apr 21, 2017

Sparse feature support #138

Open

Conversation

vene commented Jun 27, 2013

Uh oh!

amueller commented Jun 27, 2013

Uh oh!

amueller commented Jun 27, 2013

Uh oh!

amueller commented Jun 27, 2013

Uh oh!

vene commented Jun 27, 2013

Uh oh!

amueller commented Jun 27, 2013

Uh oh!

amueller commented Jun 27, 2013

Uh oh!

amueller commented Jun 27, 2013

Uh oh!

vene commented Jun 27, 2013

Uh oh!

amueller commented Jun 27, 2013

Uh oh!

amueller commented Jun 27, 2013

Uh oh!

amueller commented Jun 27, 2013

Uh oh!

amueller Jun 27, 2013

Choose a reason for hiding this comment

Uh oh!

amueller commented Aug 10, 2013

Uh oh!

zaxtax commented Aug 10, 2013

Uh oh!

amueller commented Aug 11, 2013

Uh oh!

amueller commented Aug 14, 2013

Uh oh!

larsmans commented Aug 14, 2013

Uh oh!

vene commented Aug 14, 2013

Uh oh!

larsmans commented Aug 14, 2013

Uh oh!

vene commented Aug 14, 2013

Uh oh!

vene commented Aug 14, 2013

Uh oh!

larsmans commented Aug 15, 2013

Uh oh!

vene commented Aug 15, 2013

Uh oh!

pietrodn commented Apr 21, 2017

Uh oh!

amueller commented Aug 2, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants