15.2.153. camcops_server.cc_modules.cc_spreadsheet¶
camcops_server/cc_modules/cc_spreadsheet.py
Copyright (C) 2012, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).
This file is part of CamCOPS.
CamCOPS is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
CamCOPS is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with CamCOPS. If not, see <https://www.gnu.org/licenses/>.
Helper functions/classes for spreadsheet-style tab-separated value (TSV) (and related) exports.
- class camcops_server.cc_modules.cc_spreadsheet.SpreadsheetCollection[source]¶
A collection of
camcops_server.cc_modules.cc_spreadsheet.SpreadsheetPage
pages (spreadsheets), like an Excel workbook.- add_page(page: camcops_server.cc_modules.cc_spreadsheet.SpreadsheetPage) None [source]¶
Adds a new page to our collection. If the new page has the same name as an existing page, rows from the new page are added to the existing page. Does nothing if the new page is empty.
- add_pages(pages: List[camcops_server.cc_modules.cc_spreadsheet.SpreadsheetPage]) None [source]¶
Adds all
pages
to our collection, viaadd_page()
.
- as_ods() bytes [source]¶
Returns the TSV collection as an ODS (OpenOffice spreadsheet document) file.
- as_r() str [source]¶
Returns data as an R script.
This could be more sophisticated, e.g. creating factors with appropriate levels (etc.).
- as_zip(encoding: str = 'utf-8') bytes [source]¶
Returns the TSV collection as a ZIP file containing TSV files.
- Parameters
encoding – encoding to use when writing the TSV files
- delete_columns(headings: Container[str]) None [source]¶
Across all pages, removes columns with the specified heading names. Used to simplify spreadsheets.
- get_pages_with_valid_sheet_names() Dict[camcops_server.cc_modules.cc_spreadsheet.SpreadsheetPage, str] [source]¶
Returns an ordered mapping from
SpreadsheetPage
objects to their sheet names.
- static get_sheet_title(page: camcops_server.cc_modules.cc_spreadsheet.SpreadsheetPage) str [source]¶
Returns a worksheet name for a
SpreadsheetPage
.See
openpyxl/workbook/child.py
.Excel prohibits
\
,*
,?
,:
,/
,[
,]
LibreOffice also prohibits
'
as first or last character but let’s just replace that globally.
- get_tsv_file(page_name: str) str [source]¶
Returns a TSV file for a named page.
- Raises
AssertionError –
- static make_sheet_names_unique(name_dict: Dict[camcops_server.cc_modules.cc_spreadsheet.SpreadsheetPage, str]) None [source]¶
Modifies (in place) a mapping from
SpreadsheetPage
to worksheet names, such that all page names are unique.See also
avoid_duplicate_name()
inopenpxl/workbook/child.py
We keep the 31 character restriction
- page_with_name(page_name: str) Optional[camcops_server.cc_modules.cc_spreadsheet.SpreadsheetPage] [source]¶
Returns the page with the specific name, or
None
if no such page exists.
- write_ods(file: Union[str, BinaryIO]) None [source]¶
Writes an ODS (OpenOffice spreadsheet document) to a file.
- Parameters
file – filename or file-like object
- write_r(filename: str, encoding: str = 'utf-8') None [source]¶
Write the contents in R format to a file.
- Parameters
filename – filename or file-like object
encoding – encoding to use
- write_xlsx(file: Union[str, BinaryIO]) None [source]¶
Write the contents in XLSX (Excel) format to a file.
- Parameters
file – filename or file-like object
- write_zip(file: Union[str, BinaryIO], encoding: str = 'utf-8', compression: int = 8) None [source]¶
Writes data to a file, as a ZIP file of TSV files.
- Parameters
file – filename or file-like object
encoding – encoding to use when writing the TSV files
compression – compression method to use
Choice of compression method: see
https://en.wikipedia.org/wiki/Zip_(file_format)#Compression_methods
Note also that
openpyxl
usesZIP_DEFLATED
, which seems to be the most portable if not the best compression.
- class camcops_server.cc_modules.cc_spreadsheet.SpreadsheetPage(name: str, rows: List[Union[Dict[str, Any], collections.OrderedDict]])[source]¶
Represents a single “spreadsheet” page, e.g. for TSV/Excel/ODS output.
- __init__(name: str, rows: List[Union[Dict[str, Any], collections.OrderedDict]]) None [source]¶
- Parameters
name – name for the whole sheet
rows – list of rows, where each row is a dictionary mapping column name to value
- add_or_set_column(heading: str, values: List[Any]) None [source]¶
Set the column labelled
heading
so it contains the values specified invalues
. The length ofvalues
must equal the number of rows that we already contain.- Raises
AssertionError –
the number of existing rows –
- add_or_set_columns_from_page(other: camcops_server.cc_modules.cc_spreadsheet.SpreadsheetPage) None [source]¶
This function presupposes that
self
andother
are two pages (“spreadsheets”) with matching rows.It updates values or creates columns in
self
such that the values from all columns inother
are written to the corresponding rows ofself
.- Raises
AssertionError –
the same number of rows. –
- add_or_set_value(heading: str, value: Any) None [source]¶
If we contain only a single row, this function will set the value for a given column (
heading
) tovalue
.- Raises
AssertionError –
- add_rows_from_page(other: camcops_server.cc_modules.cc_spreadsheet.SpreadsheetPage) None [source]¶
Add all rows from
other
toself
.
- delete_columns(headings: Container[str]) None [source]¶
Removes columns with the specified heading names. Used to simplify spreadsheets.
Since our rows are a dictionary, and our export functions are based on the headings, all we have to do is to delete the unwanted headings.
- property empty: bool¶
Do we have zero rows?
- classmethod from_headings_rows(name: str, headings: List[str], rows: List[Sequence[Any]]) camcops_server.cc_modules.cc_spreadsheet.SpreadsheetPage [source]¶
Creates a SpreadsheetPage object using a list of headings and the row data as a list of lists.
- classmethod from_result(name: str, rp: sqlalchemy.engine.cursor.CursorResult) camcops_server.cc_modules.cc_spreadsheet.SpreadsheetPage [source]¶
Creates a SpreadsheetPage object from an SQLAlchemy Result.
- Parameters
rp – A :class:` sqlalchemy.engine.Result`.
name – Name for this sheet.
- get_tsv(dialect: str = 'excel-tab') str [source]¶
Returns the entire page (sheet) as TSV: one header row and then lots of data rows.
For the dialect, see https://docs.python.org/3/library/csv.html#csv.excel_tab.
For CSV files, see RGC 4180: https://tools.ietf.org/html/rfc4180.
For TSV files, see https://www.iana.org/assignments/media-types/text/tab-separated-values.
Test code:
import io import csv from typing import List def test(row: List[str], dialect: str = "excel-tab") -> str: f = io.StringIO() writer = csv.writer(f, dialect=dialect) writer.writerow(row) return f.getvalue() test(["hello", "world"]) test(["hello\ttab", "world"]) # actual tab within double quotes test(["hello\nnewline", "world"]) # actual newline within double quotes test(['hello"doublequote', "world"]) # doubled double quote within double quotes
- property plainrows: List[List[Any]]¶
Returns a list of rows, where each row is a list of values. Does not include a “header” row.
Compare
rows
, which is a list of dictionaries.
- r_data_table_definition() str [source]¶
Returns a string to define this object as a
data.table
in R.See also:
- r_object_name() str [source]¶
Name of the object when imported into R. The main thing: no leading underscores.
- spreadsheetrows(converter: Callable[[Any], Any]) List[List[Any]] [source]¶
Like
plainrows()
, but (a) ensures every cell is converted to a value that can be sent to a spreadsheet converted (e.g. ODS, XLSX), and (b) includes a header row.
- camcops_server.cc_modules.cc_spreadsheet.benchmark_save(xlsx_filename: str = 'test.xlsx', ods_filename: str = 'test.ods', tsv_zip_filename: str = 'test.zip', r_filename: str = 'test.R') None [source]¶
Use with:
from cardinal_pythonlib.logs import main_only_quicksetup_rootlogger from camcops_server.cc_modules.cc_spreadsheet import benchmark_save main_only_quicksetup_rootlogger() benchmark_save()
- Parameters
xlsx_filename – XLSX file to create
ods_filename – ODS file to create
tsv_zip_filename – TSV ZIP file to create
r_filename – R script to create
Problem in Nov 2019 is that ODS is extremely slow. Rough timings:
TSV ZIP: about 4.1 Mb, about 0.2 s. Good.
XLSX (via openpyxl): about 4.6 Mb, 16 seconds.
XLSX (via pyexcel_xlsx): about 4.6 Mb, 16 seconds.
ODS (via odswriter): about 53 Mb, 56 seconds.
ODS (via pyexcel_ods3): about 2.8 Mb, 29 seconds.