Text Processing in Python
共享用户信息
|
书籍简介
Text Processing in Python is an example-driven, hands-on tutorial that carefully teaches programmers how to accomplish numerous text processing tasks using the Python language. Filled with concrete examples, this book provides efficient and effective solutions to specific text processing problems and practical strategies for dealing with all types of text processing challenges.
Text Processing in Python begins with an introduction to text processing and contains a quick Python tutorial to get you up to speed. It then delves into essential text processing subject areas, including string operations, regular expressions, parsers and state machines, and Internet tools and techniques. Appendixes cover such important topics as data compression and Unicode. A comprehensive index and plentiful cross-referencing offer easy access to available information. In addition, exercises throughout the book provide readers with further opportunity to hone their skills either on their own or in the classroom. A companion Web site (http://gnosis.cx/TPiP) contains source code and examples from the book.
Here is some of what you will find in thie book:
-
When do I use formal parsers to process structured and semi-structured data? Page 257
-
How do I work with full text indexing? Page 199
-
What patterns in text can be expressed using regular expressions? Page 204
-
How do I find a URL or an email address in text? Page 228
-
How do I process a report with a concrete state machine? Page 274
-
How do I parse, create, and manipulate internet formats? Page 345
-
How do I handle lossless and lossy compression? Page 454
-
How do I find codepoints in Unicode? Page 465
Preface
Section 0.1. What Is Text Processing?
Section 0.2. The Philosophy of Text Processing
Section 0.3. What You'll Need to Use This Book
Section 0.4. Conventions Used in This Book
Section 0.5. A Word on Source Code Examples
Section 0.6. External Resources
Acknowledgments
Chapter 1. Python Basics
Section 1.1. Techniques and Patterns
Section 1.2. Standard Modules
Section 1.3. Other Modules in the Standard Library
Chapter 2. Basic String Operations
Section 2.1. Some Common Tasks
Section 2.2. Standard Modules
Section 2.3. Solving Problems
Chapter 3. Regular Expressions
Section 3.1. A Regular Expression Tutorial
Section 3.2. Some Common Tasks
Section 3.3. Standard Modules
Chapter 4. Parsers and State Machines
Section 4.1. An Introduction to Parsers
Section 4.2. An Introduction to State Machines
Section 4.3. Parser Libraries for Python
Chapter 5. Internet Tools and Techniques
Section 5.1. Working with Email and Newsgroups
Section 5.2. World Wide Web Applications
Section 5.3. Synopses of Other Internet Modules
Section 5.4. Understanding XML
Appendix A. A Selective and Impressionistic Short Review of Python
Section A.1. What Kind of Language Is Python?
Section A.2. Namespaces and Bindings
Section A.3. Datatypes
Section A.4. Flow Control
Section A.5. Functional Programming
Appendix B. A Data Compression Primer
Section B.1. Introduction
Section B.2. Lossless and Lossy Compression
Section B.3. A Data Set Example
Section B.4. Whitespace Compression
Section B.5. Run-Length Encoding
Section B.6. Huffman Encoding
Section B.7. Lempel Ziv-Compression
Section B.8. Solving the Right Problem
Section B.9. A Custom Text Compressor
Section B.10. References
Appendix C. Understanding Unicode
Section C.1. Some Background on Characters
Section C.2. What Is Unicode?
Section C.3. Encodings
Section C.4. Declarations
Section C.5. Finding Codepoints
Section C.6. Resources
Appendix D. A State Machine for Adding Markup to Text
Appendix E. Glossary
Tags: Python