Skip to content

[WIP] Linear chain CRF example#4

Open
vene wants to merge 2 commits intopystruct:masterfrom
vene:chains
Open

[WIP] Linear chain CRF example#4
vene wants to merge 2 commits intopystruct:masterfrom
vene:chains

Conversation

@vene
Copy link
Contributor

@vene vene commented Jun 27, 2013

Rebased this PR to contain just the linear CRF example, cleaned up. It's visibly slower than it should be for such small data (even with inference='unary'), now we can use it to check why.

@amueller
Copy link
Member

great! I wanted to have an example like that for a long time!
Maybe it is worth reimplementing the stuff from GraphCRF for the special case of chains, on the other hand it is nice that this required very little code :)

@amueller
Copy link
Member

One test I usually do is that the energy of the configuration (returned by the inference or by inference.compute_energy) is the same as np.dot(w, psi(x, y)). This ensures that the psi implementation is compatible with the inference implementation.
It is also good to test that the energy returned by loss-augmented inference is equal to np.dot(w, psi(x, y)) + loss.
That ensures that loss-augmented inference does the right thing.
Sometimes I also add a test where I set the weights by hand and check that the result is what I expected it to be.

@amueller
Copy link
Member

On the other hand, as you inherited everything from GraphCRF you should be fine.
Do you even need the reshape code btw?

@vene
Copy link
Contributor Author

vene commented Jun 27, 2013

I don't remember exactly if I need it, it might need some refactoring in
the graphcrf to accept flat structures.

Something that I wanted to comment on, though it's a bit off-topic here:
It's annoying that you need to know n_features when building a model. I
did some model selection using sklearn's GridSearchCV and had to implement
a hacky wrapper around the learner to reinitialize the model before fitting
with n_features=X[0].shape[1]. Did you run into this?

On Thu, Jun 27, 2013 at 2:25 PM, Andreas Mueller
[email protected]:

On the other hand, as you inherited everything from GraphCRF you should
be fine.
Do you even need the reshape code btw?


Reply to this email directly or view it on GitHubhttps://github.com/amueller/pystruct/pull/4#issuecomment-20112155
.

@amueller
Copy link
Member

Yes, this is annoying.
I wanted the learner to be independent of the data representation, though, so it doesn't know about "n_features".
But it needs to know how large the weight vector is to allocate. On the other hand the model needs to know how many feature it has to compute size_psi.
If you have a nice way around this, I'm all ears.
It would be possible to add an additional function to the models initialize or get_size_psi that the learner can call to initialize the model using the data.

@amueller
Copy link
Member

There are certain times when I am doubtful that having the learner own the model was a good idea. Maybe it should have been the other way around ^^

@amueller
Copy link
Member

But doing it the other way round, the GridSearchCV would also be kind of awkward.

@vene
Copy link
Contributor Author

vene commented Jun 27, 2013

Maybe it's the CV that needs to be aware and initialize the model everytime?

BTW I got sparse support for the chains with 2 lines of code, but I need to
measure the memory use and check that it's not all rendered moot by dense *
sparse products. Commits coming tonight hopefully.

On Thu, Jun 27, 2013 at 2:46 PM, Andreas Mueller
[email protected]:

But doing it the other way round, the GridSearchCV would also be kind of
awkward.


Reply to this email directly or view it on GitHubhttps://github.com/amueller/pystruct/pull/4#issuecomment-20112979
.

@amueller
Copy link
Member

sweet. should I merge the rest in the meantime? Could you remove the commented out code? And is there a way to redo the example as a plot_ example? That would be most delightful ;)

@amueller
Copy link
Member

I don't remember exactly if I need it, it might need some refactoring in the graphcrf to accept flat structures.

GraphCRF only accepts flat structure, right? Or maybe we mean different things?

@amueller
Copy link
Member

Some explanation to the example would also be nice ;) And maybe also add that as a test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a chain, the default inference should probably be belief propagation via LibDAI. Move-making doesn't make much sense.

@amueller
Copy link
Member

I am about to merge #44. I'll leave this one open, though, as I didn't include the syllable example. Maybe you find the time to polish the example a bit. That would be awesome!

@zaxtax
Copy link
Member

zaxtax commented Aug 10, 2013

I'll try to polish it this weekend. Unless @vene is on it

@amueller
Copy link
Member

Thanks @zaxtax that would be awesome. I might branch before that but I can always cherry-pick it later.

@amueller
Copy link
Member

cool I'll give it a shot later / tomorrow :)

@larsmans
Copy link
Contributor

Just wondering, what kind of results do you get on this? I ported your example to seqlearn and I only got ~43% accuracy with your features. I probably did something wrong :(

@vene
Copy link
Contributor Author

vene commented Aug 14, 2013

With the example exactly as it is (so with the SubgradientSSVM) I get 72
test score (mind how I measure the score: number of instances where all
individual "zeros" were perfectly matched)

Could I have the code you used with seqlearn?

On Wed, Aug 14, 2013 at 11:34 PM, Lars Buitinck [email protected]:

Just wondering, what kind of results do you get on this? I ported your
example to seqlearn and I only got ~43% accuracy with your features. I
probably did something wrong :(


Reply to this email directly or view it on GitHubhttps://github.com//pull/4#issuecomment-22665091
.

@larsmans
Copy link
Contributor

mind how I measure the score: number of instances where all
individual "zeros" were perfectly matched

Are you sure? Your code says

score = np.mean([np.all((y_t == 0) == (y_p == 0))
                 for (y_t, y_p) in zip(y_test, y_pred)])
print("Test score: {:2.2f}".format(ssvm.score(X_test, y_test)))

Mine is here. It's messy.

@vene
Copy link
Contributor Author

vene commented Aug 14, 2013

I'm pretty sure, but I agree my line is very dense and unclear.

Basically before scoring it turns the 0, 1, 2, ... 7 label set into {0, 1},
by turning 0 into 1 and everything else into 0.
Then for each sequence it calls np.all(y_true == y_pred) on this
transformed tagging (ie it doesn't matter if you mislabel 6 as 5 as long as
you get all the 0s right)
Then np.mean over this should essentially compute the overall word-level
accuracy.

I re-ran the example with a vanilla perceptron with 5 iterations
(everything else is pystruct default except for verbose=1):

iteration 0
avg loss: 0.357857 w: [ 0.  0.  0. ...,  0.  0.  0.]
effective learning rate: 1.000000
iteration 1
avg loss: 0.319921 w: [ 0.  0.  0. ...,  0.  0.  0.]
effective learning rate: 1.000000
iteration 2
avg loss: 0.311667 w: [ 0.  0.  0. ...,  0.  0.  0.]
effective learning rate: 1.000000
iteration 3
avg loss: 0.310476 w: [ 0.  0.  0. ...,  0.  0.  0.]
effective learning rate: 1.000000
iteration 4
avg loss: 0.297222 w: [ 0.  0.  0. ...,  0.  0.  0.]
effective learning rate: 1.000000
Test score: 0.68

So basically quite worse loss than seqlearn, strangely better end resuit
though (?)

The SVM gets 0.72, while (in some experiments I did a month ago) CRFsuite
with window size = 3 * , averaged perceptron and cross-validation for
max_iter gets 0.768. (same kind of score)

  • by window size I mean the feature function also includes c[-3], c[-3-2],
    c[-3-2-1] and the symmetricals. At least I think this is what I coded it
    like; it makes sense that it get 0.05 more accuracy.

On Thu, Aug 15, 2013 at 12:32 AM, Lars Buitinck [email protected]:

mind how I measure the score: number of instances where all
individual "zeros" were perfectly matched

Are you sure? Your code says

score = np.mean([np.all((y_t == 0) == (y_p == 0))
for (y_t, y_p) in zip(y_test, y_pred)])
print("Test score: {:2.2f}".format(ssvm.score(X_test, y_test)))

Mine is here https://gist.github.com/larsmans/6235887. It's messy.


Reply to this email directly or view it on GitHubhttps://github.com//pull/4#issuecomment-22668901
.

@vene
Copy link
Contributor Author

vene commented Aug 14, 2013

Setting trans_features=True results in a crash though. I'll investigate tomorrow.

Iteration  1... python(70779,0x7fff74a6e180) malloc: *** error for object 0xbff0000000000000: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug

@larsmans
Copy link
Contributor

What I mean is that you compute the custom score, but then print ssvm.score. What does that compute?

(trans_features is less stable than I hoped. I'm working on that today.)

@vene
Copy link
Contributor Author

vene commented Aug 15, 2013

Oops, yeah. Sorry about that, now I get 46% too.

On Thu, Aug 15, 2013 at 9:48 AM, Lars Buitinck [email protected]:

What I mean is that you compute the custom score, but then print
ssvm.score. What does that compute?

(trans_features is less stable than I hoped. I'm working on that today.)


Reply to this email directly or view it on GitHubhttps://github.com//pull/4#issuecomment-22687980
.

@pietrodn
Copy link

I manually applied this patch to pystruct and tested the performance with the sparse matrices support. RAM usage is decreased ~10x and the speed is increased ~6x. I needed this because FrankWolfeSSVM with ChainCRF was eating up gigabytes of RAM for certain inputs.

I am currently trying to classify bytes inside a binary executable files, so for each byte in the binary file I have an 1-hot vector of 256 elements to give as input to the ChainCRF. Without sparse matrices, the input size would be 256 times the actual size of the binary!

Thank you! Please consider merging this :)

@amueller
Copy link
Member

amueller commented Aug 2, 2017

@pietrodn would you mind picking this up and creating some examples and benchmarks?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants