Numpy, the Python
                         foundation for number
                               crunching

                              Didrik Pinte, Enthought
                            London Data Science meetup




Monday 22 October 2012
Number crunching?

        •High-level api
        •Interactivity & visualization
        •Performance
        •Low-level access




Monday 22 October 2012
Evidence ?
                                                    Clyther




                         PyOpenGL
                                                          PyCuda




                                                  PyAlgoTrade


        PyTrilinos

              Joblib                numpy-boost

          petsc4py



Monday 22 October 2012
Evidence ?
                                                    Clyther




                         PyOpenGL
                                                          PyCuda




                                                  PyAlgoTrade


        PyTrilinos

              Joblib                numpy-boost

          petsc4py



Monday 22 October 2012
Why then?

        •The API ...
        •Simple but powerful memory model
        •Open access to the data




Monday 22 October 2012
Array data structure




Monday 22 October 2012
Let’s look at the code!

        •Examples:
             – API / interactivity
             – memory management with stride_tricks
             – pikos
             – extensions with talib
             –(joblib, memmap/multiprocessing,
              ipython //)




Monday 22 October 2012
API / interactivity




Monday 22 October 2012
Memory management



                         1   2   3   4   5   6         7    8    9   10   11   12




                                                 =




                                         1   2         3    4

                                         2   3         4    5

                                         3   4         5    6

                                         4   5         6    7

                                         5   6         7    8

                                         6   7         8    9

                                         7   8         9    10

                                         8   9         10   11
                                                 ...



Monday 22 October 2012
Memory management


Shape 12,
Strides 8,
                         1   2   3   4   5   6         7    8    9   10   11   12




                                                 =




                                         1   2         3    4
Shape 9,4,                               2   3         4    5
Strides 8,8                              3   4         5    6

                                         4   5         6    7

                                         5   6         7    8

                                         6   7         8    9

                                         7   8         9    10

                                         8   9         10   11
                                                 ...



Monday 22 October 2012
Memory management - pikos




Monday 22 October 2012
Memory management - chaco




Monday 22 October 2012
Low level access
   %timeit talib.moving_average(adj_close, optInTimePeriod=5)
   100000 loops, best of 3: 7.67 us per loop

   %timeit as_strided(adj_close, shape=(len(adj_close)-4, 5), strides=(8, 8)).mean
   (axis=1)
   10000 loops, best of 3: 28.2 us per loop




Monday 22 October 2012
Conclusion

        •It’s obvious, no?




Monday 22 October 2012
Q&A?




Monday 22 October 2012

Numpy, the Python foundation for number crunching

  • 1.
    Numpy, the Python foundation for number crunching Didrik Pinte, Enthought London Data Science meetup Monday 22 October 2012
  • 2.
    Number crunching? •High-level api •Interactivity & visualization •Performance •Low-level access Monday 22 October 2012
  • 3.
    Evidence ? Clyther PyOpenGL PyCuda PyAlgoTrade PyTrilinos Joblib numpy-boost petsc4py Monday 22 October 2012
  • 4.
    Evidence ? Clyther PyOpenGL PyCuda PyAlgoTrade PyTrilinos Joblib numpy-boost petsc4py Monday 22 October 2012
  • 5.
    Why then? •The API ... •Simple but powerful memory model •Open access to the data Monday 22 October 2012
  • 6.
  • 7.
    Let’s look atthe code! •Examples: – API / interactivity – memory management with stride_tricks – pikos – extensions with talib –(joblib, memmap/multiprocessing, ipython //) Monday 22 October 2012
  • 8.
  • 9.
    Memory management 1 2 3 4 5 6 7 8 9 10 11 12 = 1 2 3 4 2 3 4 5 3 4 5 6 4 5 6 7 5 6 7 8 6 7 8 9 7 8 9 10 8 9 10 11 ... Monday 22 October 2012
  • 10.
    Memory management Shape 12, Strides8, 1 2 3 4 5 6 7 8 9 10 11 12 = 1 2 3 4 Shape 9,4, 2 3 4 5 Strides 8,8 3 4 5 6 4 5 6 7 5 6 7 8 6 7 8 9 7 8 9 10 8 9 10 11 ... Monday 22 October 2012
  • 11.
    Memory management -pikos Monday 22 October 2012
  • 12.
    Memory management -chaco Monday 22 October 2012
  • 13.
    Low level access %timeit talib.moving_average(adj_close, optInTimePeriod=5) 100000 loops, best of 3: 7.67 us per loop %timeit as_strided(adj_close, shape=(len(adj_close)-4, 5), strides=(8, 8)).mean (axis=1) 10000 loops, best of 3: 28.2 us per loop Monday 22 October 2012
  • 14.
    Conclusion •It’s obvious, no? Monday 22 October 2012
  • 15.