Reading Excel in Python
UPDATE@201801: 好早的文章,从 Python 这个文档里面移出来。
分析excel文件在实际生活中还是比较常见的,通常我们的问题就是卡在读取上面(主要原因是因为我对COM接口不太熟悉,不太了解windows编程)。使用xlrd可以通过python来访问excel文件。
- xlrd http://pypi.python.org/pypi/xlrd
- 文档 http://www.lexicon.net/sjmachin/xlrd.htm 看起来可能比较费劲,下面的py文档更加可读
- py文档 packages.python.org/xlrd3(xlrd3是针对python3的,但是和xlrd API是完全兼容的)
- 参考链接 http://www.shello.name/?p=133
- update@201510: 近期发现 pyExcelerator 也挺好用的, 可以同时读写excel.
从文档里面可以看到它是直接分析excel文件的,通过阅读OpenOffice的关于M$ Excel文件格式文档编写的。
Development of this module would not have been possible without the document "OpenOffice.org's Documentation of the Microsoft Excel File Format" ("OOo docs" for short). The latest version is available from OpenOffice.org in PDF format and XML format. Small portions of the OOo docs are reproduced in this document. A study of the OOo docs is recommended for those who wish a deeper understanding of the Excel file layout than the xlrd docs can provide.
- PDF format http://sc.openoffice.org/excelfileformat.pdf
- XML format http://sc.openoffice.org/excelfileformat.sxw
基本上能够得到所有的数据,包括处理date信息,单元格的格式化信息,名字引用信息,内部统一使用Unicode(如果内部使用其他编码的话会自动转换)
Features:
- Support for handling dates, and documentation of Excel date problems and how to avoid them.
- Unicode aware; correctly handles "compressed" Unicode in modern files; decodes legacy charsets in older files (if Python has the codec).
- Extracts all data (including Booleans and error-values)
- Extracts named references
- Extracts formatting information for cells (number format, font, alignment, borders, backgrounds, protection) and rows/columns (default height/width, etc). This effort was funded by Simplistix Ltd.
- Further information: APIs, README, HISTORY
但是也有一些信息没有提取出来(但是对于很多简单的应用来说的话是没有必要的)。包括表格,图片,宏等嵌入对象,VBA模块,公式,注释以及超链等。
Exclusions: xlrd will not attempt to decode password-protected (encrypted) files. Otherwise, it will safely and reliably ignore any of these if present:
- Charts, Macros, Pictures, any other embedded object. WARNING: currently this includes embedded worksheets.
- Visual Basic (VBA) modules
- Formulas (results of formula calculations are extracted, of course)
- Comments and hyperlinks
来个简单的例子:
#!/usr/bin/env python #coding:utf-8 #Copyright (C) dirlt import xlrd book=xlrd.open_workbook('x.xls') print '--------------------sheets:#%d--------------------'%(book.nsheets) for i in range(0,book.nsheets): sheet=book.sheet_by_index(i) print '----------sheet%d:\'%s\', rows:%d, cols:%d----------'%(i,sheet.name,sheet.nrows,sheet.ncols) for r in range(0,sheet.nrows): for c in range(0,sheet.ncols): cell=sheet.cell(r,c) if(cell): # access cell.ctype. print '%s '%(cell.value), print ''