15.2.153. camcops_server.cc_modules.cc_spreadsheet

camcops_server/cc_modules/cc_spreadsheet.py


Copyright (C) 2012, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).

This file is part of CamCOPS.

CamCOPS is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

CamCOPS is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with CamCOPS. If not, see <https://www.gnu.org/licenses/>.


Helper functions/classes for spreadsheet-style tab-separated value (TSV) (and related) exports.

class camcops_server.cc_modules.cc_spreadsheet.SpreadsheetCollection[source]

A collection of camcops_server.cc_modules.cc_spreadsheet.SpreadsheetPage pages (spreadsheets), like an Excel workbook.

__init__() None[source]
add_page(page: camcops_server.cc_modules.cc_spreadsheet.SpreadsheetPage) None[source]

Adds a new page to our collection. If the new page has the same name as an existing page, rows from the new page are added to the existing page. Does nothing if the new page is empty.

add_pages(pages: List[camcops_server.cc_modules.cc_spreadsheet.SpreadsheetPage]) None[source]

Adds all pages to our collection, via add_page().

as_ods() bytes[source]

Returns the TSV collection as an ODS (OpenOffice spreadsheet document) file.

as_r() str[source]

Returns data as an R script.

This could be more sophisticated, e.g. creating factors with appropriate levels (etc.).

as_xlsx() bytes[source]

Returns the TSV collection as an XLSX (Excel) file.

as_zip(encoding: str = 'utf-8') bytes[source]

Returns the TSV collection as a ZIP file containing TSV files.

Parameters

encoding – encoding to use when writing the TSV files

delete_columns(headings: Container[str]) None[source]

Across all pages, removes columns with the specified heading names. Used to simplify spreadsheets.

delete_page(page_name: str) None[source]

Delete any page with the name specified.

delete_pages(page_names: Container[str]) None[source]

Delete pages with the names specified.

get_page_names() List[str][source]

Return a list of the names of all our pages.

get_pages_with_valid_sheet_names() Dict[camcops_server.cc_modules.cc_spreadsheet.SpreadsheetPage, str][source]

Returns an ordered mapping from SpreadsheetPage objects to their sheet names.

static get_sheet_title(page: camcops_server.cc_modules.cc_spreadsheet.SpreadsheetPage) str[source]

Returns a worksheet name for a SpreadsheetPage.

See openpyxl/workbook/child.py.

  • Excel prohibits \, *, ?, :, /, [, ]

  • LibreOffice also prohibits ' as first or last character but let’s just replace that globally.

get_tsv_file(page_name: str) str[source]

Returns a TSV file for a named page.

Raises

AssertionError

static make_sheet_names_unique(name_dict: Dict[camcops_server.cc_modules.cc_spreadsheet.SpreadsheetPage, str]) None[source]

Modifies (in place) a mapping from SpreadsheetPage to worksheet names, such that all page names are unique.

  • See also avoid_duplicate_name() in openpxl/workbook/child.py

  • We keep the 31 character restriction

page_with_name(page_name: str) Optional[camcops_server.cc_modules.cc_spreadsheet.SpreadsheetPage][source]

Returns the page with the specific name, or None if no such page exists.

sort_headings_within_all_pages() None[source]

Sort headings within each of our pages.

sort_pages() None[source]

Sort our pages by their page name.

write_ods(file: Union[str, BinaryIO]) None[source]

Writes an ODS (OpenOffice spreadsheet document) to a file.

Parameters

file – filename or file-like object

write_r(filename: str, encoding: str = 'utf-8') None[source]

Write the contents in R format to a file.

Parameters
  • filename – filename or file-like object

  • encoding – encoding to use

write_xlsx(file: Union[str, BinaryIO]) None[source]

Write the contents in XLSX (Excel) format to a file.

Parameters

file – filename or file-like object

write_zip(file: Union[str, BinaryIO], encoding: str = 'utf-8', compression: int = 8) None[source]

Writes data to a file, as a ZIP file of TSV files.

Parameters
  • file – filename or file-like object

  • encoding – encoding to use when writing the TSV files

  • compression – compression method to use

Choice of compression method: see

Note also that openpyxl uses ZIP_DEFLATED, which seems to be the most portable if not the best compression.

class camcops_server.cc_modules.cc_spreadsheet.SpreadsheetPage(name: str, rows: List[Union[Dict[str, Any], collections.OrderedDict]])[source]

Represents a single “spreadsheet” page, e.g. for TSV/Excel/ODS output.

__init__(name: str, rows: List[Union[Dict[str, Any], collections.OrderedDict]]) None[source]
Parameters
  • name – name for the whole sheet

  • rows – list of rows, where each row is a dictionary mapping column name to value

add_or_set_column(heading: str, values: List[Any]) None[source]

Set the column labelled heading so it contains the values specified in values. The length of values must equal the number of rows that we already contain.

Raises
  • AssertionError

  • the number of existing rows

add_or_set_columns_from_page(other: camcops_server.cc_modules.cc_spreadsheet.SpreadsheetPage) None[source]

This function presupposes that self and other are two pages (“spreadsheets”) with matching rows.

It updates values or creates columns in self such that the values from all columns in other are written to the corresponding rows of self.

Raises
  • AssertionError

  • the same number of rows.

add_or_set_value(heading: str, value: Any) None[source]

If we contain only a single row, this function will set the value for a given column (heading) to value.

Raises

AssertionError

add_rows_from_page(other: camcops_server.cc_modules.cc_spreadsheet.SpreadsheetPage) None[source]

Add all rows from other to self.

delete_columns(headings: Container[str]) None[source]

Removes columns with the specified heading names. Used to simplify spreadsheets.

Since our rows are a dictionary, and our export functions are based on the headings, all we have to do is to delete the unwanted headings.

property empty: bool

Do we have zero rows?

classmethod from_headings_rows(name: str, headings: List[str], rows: List[Sequence[Any]]) camcops_server.cc_modules.cc_spreadsheet.SpreadsheetPage[source]

Creates a SpreadsheetPage object using a list of headings and the row data as a list of lists.

classmethod from_result(name: str, rp: sqlalchemy.engine.cursor.CursorResult) camcops_server.cc_modules.cc_spreadsheet.SpreadsheetPage[source]

Creates a SpreadsheetPage object from an SQLAlchemy Result.

Parameters
  • rp – A :class:` sqlalchemy.engine.Result`.

  • name – Name for this sheet.

get_tsv(dialect: str = 'excel-tab') str[source]

Returns the entire page (sheet) as TSV: one header row and then lots of data rows.

For the dialect, see https://docs.python.org/3/library/csv.html#csv.excel_tab.

For CSV files, see RGC 4180: https://tools.ietf.org/html/rfc4180.

For TSV files, see https://www.iana.org/assignments/media-types/text/tab-separated-values.

Test code:

import io
import csv
from typing import List

def test(row: List[str], dialect: str = "excel-tab") -> str:
    f = io.StringIO()
    writer = csv.writer(f, dialect=dialect)
    writer.writerow(row)
    return f.getvalue()

test(["hello", "world"])
test(["hello\ttab", "world"])  # actual tab within double quotes
test(["hello\nnewline", "world"])  # actual newline within double quotes
test(['hello"doublequote', "world"])  # doubled double quote within double quotes
property plainrows: List[List[Any]]

Returns a list of rows, where each row is a list of values. Does not include a “header” row.

Compare rows, which is a list of dictionaries.

r_data_table_definition() str[source]

Returns a string to define this object as a data.table in R.

See also:

r_object_name() str[source]

Name of the object when imported into R. The main thing: no leading underscores.

sort_headings() None[source]

Sort our headings internally.

spreadsheetrows(converter: Callable[[Any], Any]) List[List[Any]][source]

Like plainrows(), but (a) ensures every cell is converted to a value that can be sent to a spreadsheet converted (e.g. ODS, XLSX), and (b) includes a header row.

write_to_odswriter_ods_worksheet(ws: None) None[source]

Writes data from this page to an existing odswriter ODS sheet.

write_to_openpyxl_xlsx_worksheet(ws: None) None[source]

Writes data from this page to an existing openpyxl XLSX worksheet.

camcops_server.cc_modules.cc_spreadsheet.benchmark_save(xlsx_filename: str = 'test.xlsx', ods_filename: str = 'test.ods', tsv_zip_filename: str = 'test.zip', r_filename: str = 'test.R') None[source]

Use with:

from cardinal_pythonlib.logs import main_only_quicksetup_rootlogger
from camcops_server.cc_modules.cc_spreadsheet import benchmark_save
main_only_quicksetup_rootlogger()
benchmark_save()
Parameters
  • xlsx_filename – XLSX file to create

  • ods_filename – ODS file to create

  • tsv_zip_filename – TSV ZIP file to create

  • r_filename – R script to create

Problem in Nov 2019 is that ODS is extremely slow. Rough timings:

  • TSV ZIP: about 4.1 Mb, about 0.2 s. Good.

  • XLSX (via openpyxl): about 4.6 Mb, 16 seconds.

  • XLSX (via pyexcel_xlsx): about 4.6 Mb, 16 seconds.

  • ODS (via odswriter): about 53 Mb, 56 seconds.

  • ODS (via pyexcel_ods3): about 2.8 Mb, 29 seconds.

camcops_server.cc_modules.cc_spreadsheet.file_size(filename: str) int[source]

Returns a file’s size in bytes.