Refer a friend and get % off! They'll get % off too.

High Performance Python: Practical Performant Programming for Humans

Your Python code may run correctly, but you need it to run faster. By exploring the fundamental theory behind design choices, this practical guide helps you gain a deeper understanding of Python’s implementation. You’ll learn how to locate performance bottlenecks and significantly speed up your code in high-data-volume programs.

How can you take advantage of multi-core architectures or clusters? Or build a system that can scale up and down without losing reliability? Experienced Python programmers will learn concrete solutions to these and other issues, along with war stories from companies that use high performance Python for social media analytics, productionized machine learning, and other situations.

  • Get a better grasp of numpy, Cython, and profilers
  • Learn how Python abstracts the underlying computer architecture
  • Use profiling to find bottlenecks in CPU time and memory usage
  • Write efficient programs by choosing appropriate data structures
  • Speed up matrix and vector computations
  • Use tools to compile Python down to machine code
  • Manage multiple I/O and computational operations concurrently
  • Convert multiprocessing code to run on a local or remote cluster
  • Solve large problems while using less RAM

Table of Contents

1. Understanding Performant Python
The Fundamental Computer System
    Computing Units
    Memory Units
    Communications Layers
Putting the Fundamental Elements Together
    Idealized Computing Versus the Python Virtual Machine
                Idealized computing
                Python’s virtual machine
So Why Use Python?
2. Profiling to Find Bottlenecks
Profiling Efficiently
Introducing the Julia Set
Calculating the Full Julia Set
Simple Approaches to Timing—print and a Decorator
Simple Timing Using the Unix time Command
Using the cProfile Module
Using runsnakerun to Visualize cProfile Output
Using line_profiler for Line-by-Line Measurements
Using memory_profiler to Diagnose Memory Usage
Inspecting Objects on the Heap with heapy
Using dowser for Live Graphing of Instantiated Variables
Using the dis Module to Examine CPython Bytecode
       Different Approaches, Different Complexity
Unit Testing During Optimization to Maintain Correctness
      No-op @profile Decorator
      Strategies to Profile Your Code Successfully
Wrap-Up
3. Lists and Tuples
A More Efficient Search
Lists Versus Tuples
      Lists as Dynamic Arrays
     Tuples As Static Arrays
Wrap-Up
4. Dictionaries and Sets
How Do Dictionaries and Sets Work?
      Inserting and Retrieving
      Deletion
      Resizing
      Hash Functions and Entropy
Dictionaries and Namespaces
Wrap-Up
5. Iterators and Generators
Iterators for Infinite Series
Lazy Generator Evaluation
Wrap-Up
6. Matrix and Vector Computation
Introduction to the Problem
Aren’t Python Lists Good Enough?
      Problems with Allocating Too Much
Memory Fragmentation
      Understanding perf
      Making Decisions with perf’s Output
      Enter numpy
Applying numpy to the Diffusion Problem
      Memory Allocations and In-Place Operations
       Selective Optimizations: Finding What Needs to Be Fixed
numexpr: Making In-Place Operations Faster and Easier
A Cautionary Tale: Verify “Optimizations” (scipy)
Wrap-Up
7. Compiling to C
What Sort of Speed Gains Are Possible?
JIT Versus AOT Compilers
Why Does Type Information Help the Code Run Faster?
Using a C Compiler
Reviewing the Julia Set Example
Cython
       Compiling a Pure-Python Version Using Cython
       Cython Annotations to Analyze a Block of Code
       Adding Some Type Annotations
Shed Skin
       Building an Extension Module
       The Cost of the Memory Copies
Cython and numpy
       Parallelizing the Solution with OpenMP on One Machine
Numba
Pythran
PyPy
      Garbage Collection Differences
      Running PyPy and Installing Modules
When to Use Each Technology
      Other Upcoming Projects
      A Note on Graphics Processing Units (GPUs)
      A Wish for a Future Compiler Project
Foreign Function Interfaces
     ctypes
     cffi
     f2py 
     CPython Module
Wrap-Up
8. Concurrency
Introduction to Asynchronous Programming
Serial Crawler
gevent
tornado
AsyncIO
Database Example
Wrap-Up
9. The multiprocessing Module
An Overview of the Multiprocessing Module
Estimating Pi Using the Monte Carlo Method
Estimating Pi Using Processes and Threads
     Using Python Objects
     Random Numbers in Parallel Systems
     Using numpy
Finding Prime Numbers
     Queues of Work
                  Asynchronously adding jobs to the Queue
Verifying Primes Using Interprocess CommunicationSerial Solution
      Naive Pool Solution
      A Less Naive Pool Solution
      Using Manager.Value as a Flag
      Using Redis as a Flag
      Using RawValue as a Flag
      Using mmap as a Flag
      Using mmap as a Flag Redux
Sharing numpy Data with multiprocessing
Synchronizing File and Variable Access
     File Locking
     Locking a Value
Wrap-Up
10. Clusters and Job Queues
Benefits of Clustering
Drawbacks of Clustering
       $462 Million Wall Street Loss Through Poor Cluster Upgrade Strategy
      Skype’s 24-Hour Global Outage
Common Cluster Designs
How to Start a Clustered Solution
Ways to Avoid Pain When Using Clusters
Three Clustering Solutions
       Using the Parallel Python Module for Simple Local Clusters
       Using IPython Parallel to Support Research
NSQ for Robust Production Clustering
       Queues
       Pub/sub
       Distributed Prime Calculation
Other Clustering Tools to Look At
Wrap-Up
11. Using Less RAM
Objects for Primitives Are Expensive
        The Array Module Stores Many Primitive Objects Cheaply
Understanding the RAM Used in a Collection
Bytes Versus Unicode
Efficiently Storing Lots of Text in RAM
       Trying These Approaches on 8 Million Tokenslist
                set
               More efficient tree structures
               Directed acyclic word graph (DAWG)
               Marisa trie
               Datrie
               HAT trie
               Using tries (and DAWGs) in production systems
Tips for Using Less RAM
Probabilistic Data Structures
      Very Approximate Counting with a 1-byte Morris Counter
       K-Minimum Values
       Bloom Filters
       LogLog Counter
       Real-World Example
12. Lessons from the Field
Adaptive Lab’s Social Media Analytics (SoMA)
       Python at Adaptive Lab
      SoMA’s Design
      Our Development Methodology
      Maintaining SoMA
      Advice for Fellow Engineers
Making Deep Learning Fly with RadimRehurek.comThe
       Sweet Spot
       Lessons in Optimizing
       Wrap-Up
Large-Scale Productionized Machine Learning at Lyst.com
       Python’s Place at Lyst
       Cluster Design
       Code Evolution in a Fast-Moving Start-Up
       Building the Recommendation Engine
       Reporting and Monitoring
       Some Advice
Large-Scale Social Media Analysis at Smesh
       Python’s Role at Smesh
       The Platform
       High Performance Real-Time String Matching
       Reporting, Monitoring, Debugging, and Deployment
PyPy for Successful Web and Data Processing Systems
       Prerequisites
      The Database
      The Web Application
       OCR and Translation
       Task Distribution and Workers
       Conclusion
Task Queues at Lanyrd.com
       Python’s Role at Lanyrd
       Making the Task Queue Performant
       Reporting, Monitoring, Debugging, and Deployment
       Advice to a Fellow Developer

You will get a PDF (8MB) file

$ 2.99

$ 2.99

Buy Now

Discount has been applied.

Added to cart
or
Add to Cart
Adding ...