15.2.97. camcops_server.cc_modules.cc_export

camcops_server/cc_modules/cc_export.py


Copyright (C) 2012-2020 Rudolf Cardinal (rudolf@pobox.com).

This file is part of CamCOPS.

CamCOPS is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

CamCOPS is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with CamCOPS. If not, see <https://www.gnu.org/licenses/>.


Export and research dump functions.

Export design:

WHICH RECORDS TO SEND?

The most powerful mechanism is not to have a sending queue (which would then require careful multi-instance locking), but to have a “sent” log. That way:

  • A record needs sending if it’s not in the sent log (for an appropriate recipient).

  • You can add a new recipient and the system will know about the (new) backlog automatically.

  • You can specify criteria, e.g. don’t upload records before 1/1/2014, and modify that later, and it would catch up with the backlog.

  • Successes and failures are logged in the same table.

  • Multiple recipients are handled with ease.

  • No need to alter database.pl code that receives from tablets.

  • Can run with a simple cron job.

LOCKING

MESSAGE QUEUE AND BACKEND

Thoughts as of 2018-12-22.

class camcops_server.cc_modules.cc_export.DownloadOptions(user_id: int, viewtype: str, delivery_mode: str, spreadsheet_sort_by_heading: bool = False, db_include_blobs: bool = False, db_patient_id_per_row: bool = False, include_information_schema_columns: bool = True)[source]

Represents options for the process of the user downloading tasks.

__init__(user_id: int, viewtype: str, delivery_mode: str, spreadsheet_sort_by_heading: bool = False, db_include_blobs: bool = False, db_patient_id_per_row: bool = False, include_information_schema_columns: bool = True) → None[source]
Parameters
  • user_id – ID of the user creating the request (may be needed to pass to the back-end)

  • viewtype – file format for receiving data (e.g. XLSX, SQLite)

  • delivery_mode – method of delivery (e.g. immediate, e-mail)

  • spreadsheet_sort_by_heading – (For spreadsheets.) Sort columns within each page by heading name?

  • db_include_blobs – (For database downloads.) Include BLOBs?

  • db_patient_id_per_row – (For database downloads.) Denormalize by include the patient ID in all rows of patient-related tables?

  • include_information_schema_columns – Include descriptions of the columns provided?

class camcops_server.cc_modules.cc_export.OdsExporter(req: CamcopsRequest, collection: TaskCollection, options: camcops_server.cc_modules.cc_export.DownloadOptions)[source]

Converts a set of tasks to an OpenOffice ODS file.

get_file_body() → bytes[source]

Returns binary data to be stored as a file.

class camcops_server.cc_modules.cc_export.RExporter(*args, **kwargs)[source]

Converts a set of tasks to an R script.

__init__(*args, **kwargs) → None[source]

Args: req:

collection:

a camcops_server.cc_modules.cc_taskcollection.TaskCollection

options:

DownloadOptions governing the download

get_file_body() → bytes[source]

Returns binary data to be stored as a file.

class camcops_server.cc_modules.cc_export.SqlExporter(*args, **kwargs)[source]

Converts a set of tasks to the textual SQL needed to create an SQLite file.

__init__(*args, **kwargs) → None[source]

Args: req:

collection:

a camcops_server.cc_modules.cc_taskcollection.TaskCollection

options:

DownloadOptions governing the download

download_now() → pyramid.response.Response[source]

Download the data dump in the selected format

get_data_response(body: bytes, filename: str) → pyramid.response.Response[source]

Unused.

get_file_body() → bytes[source]

Returns binary data to be stored as a file.

get_sql() → str[source]

Returns SQL text representing the SQLite database.

class camcops_server.cc_modules.cc_export.SqliteExporter(req: CamcopsRequest, collection: TaskCollection, options: camcops_server.cc_modules.cc_export.DownloadOptions)[source]

Converts a set of tasks to an SQLite binary file.

get_file_body() → bytes[source]

Returns binary data to be stored as a file.

get_sqlite_data(as_text: bool) → Union[bytes, str][source]

Returns data as a binary SQLite database, or SQL text to create it.

Parameters

as_text – textual SQL, rather than binary SQLite?

Returns

bytes or str, according to as_text

class camcops_server.cc_modules.cc_export.TaskCollectionExporter(req: CamcopsRequest, collection: TaskCollection, options: camcops_server.cc_modules.cc_export.DownloadOptions)[source]

Class to provide tasks for user download.

__init__(req: CamcopsRequest, collection: TaskCollection, options: camcops_server.cc_modules.cc_export.DownloadOptions)[source]
Parameters
create_user_download_and_email() → None[source]

Creates a user download, and e-mails the user to let them know.

download_now() → pyramid.response.Response[source]

Download the data dump in the selected format

get_file_body() → bytes[source]

Returns binary data to be stored as a file.

get_filename() → str[source]

Returns the filename for the download.

get_tsv_collection()camcops_server.cc_modules.cc_tsv.TsvCollection[source]

Converts the collection of tasks to a collection of spreadsheet-style data. Also audits the request as a basic data dump.

Returns

a camcops_server.cc_modules.cc_tsv.TsvCollection object

immediate_response(req: CamcopsRequest) → pyramid.response.Response[source]

Returns either a Response with the data, or a Response saying how the user will obtain their data later.

Parameters

req – a camcops_server.cc_modules.cc_request.CamcopsRequest

schedule_download() → None[source]

Schedule a background export to a file that the user can download later.

schedule_email() → None[source]

Schedule the export asynchronously and e-mail the logged in user when done

send_by_email() → None[source]

Send the data dump by e-mail to the logged in user

to_file() → Tuple[str, bytes][source]

Returns the tuple filename, file_contents.

class camcops_server.cc_modules.cc_export.TsvZipExporter(req: CamcopsRequest, collection: TaskCollection, options: camcops_server.cc_modules.cc_export.DownloadOptions)[source]

Converts a set of tasks to a set of TSV (tab-separated value) file, (one per table) in a ZIP file.

get_file_body() → bytes[source]

Returns binary data to be stored as a file.

class camcops_server.cc_modules.cc_export.UserDownloadFile(filename: str, directory: str = '', permitted_lifespan_min: float = 0, req: CamcopsRequest = None)[source]

Represents a file that has been generated for the user to download.

Test code:

from camcops_server.cc_modules.cc_export import UserDownloadFile
x = UserDownloadFile("/etc/hosts")

print(x.when_last_modified)  # should match output of: ls -l /etc/hosts

many = UserDownloadFile.from_directory_scan("/etc")
__init__(filename: str, directory: str = '', permitted_lifespan_min: float = 0, req: CamcopsRequest = None) → None[source]
Parameters

Notes:

  • The Unix ls command shows timestamps in the current timezone. Try TZ=utc ls -l <filename> or TZ="America/New_York" ls -l <filename> to see this.

  • The underlying timestamp is the time (in seconds) since the Unix “epoch”, which is 00:00:00 UTC on 1 Jan 1970 (https://en.wikipedia.org/wiki/Unix_time).

property contents

The file contents. May raise OSError if the read fails.

delete() → None[source]

Deletes the file. Does not raise an exception if the file does not exist.

property delete_form

Returns HTML for a form to delete this file.

property download_url

Returns a URL to download this file.

classmethod from_directory_scan(directory: str, permitted_lifespan_min: float = 0, req: CamcopsRequest = None) → List[UserDownloadFile][source]

Scans the directory and returns a list of UserDownloadFile objects, one for each file in the directory.

For each object, directory is the root directory (our parameter here), and filename is the filename RELATIVE to that.

Parameters
older_than(when: pendulum.datetime.DateTime) → bool[source]

Was the file created before the specified time?

property size

Size of the file, in bytes. Returns None if the file does not exist.

property size_str

Returns a pretty-format string describing the file’s size.

property time_left

Returns the amount of time that this file has left to live before the server will delete it. Returns None if the file does not exist.

property time_left_str

A string version of time_left().

property when_last_modified

Returns the file’s modification time, or None if it doesn’t exist.

(Creation time is harder! See https://stackoverflow.com/questions/237079/how-to-get-file-creation-modification-date-times-in-python.)

property when_last_modified_str

Returns a formatted string with the file’s modification time.

class camcops_server.cc_modules.cc_export.XlsxExporter(req: CamcopsRequest, collection: TaskCollection, options: camcops_server.cc_modules.cc_export.DownloadOptions)[source]

Converts a set of tasks to an Excel XLSX file.

get_file_body() → bytes[source]

Returns binary data to be stored as a file.

camcops_server.cc_modules.cc_export.export(req: CamcopsRequest, recipient_names: List[str] = None, all_recipients: bool = False, via_index: bool = True, schedule_via_backend: bool = False) → None[source]

Called from the command line.

Exports all relevant tasks (pending incremental exports, or everything if applicable) for specified export recipients.

Obtains a file lock, then iterates through all recipients.

Parameters
  • req – a camcops_server.cc_modules.cc_request.CamcopsRequest

  • recipient_names – list of export recipient names (as per the config file)

  • all_recipients – use all recipients?

  • via_index – use the task index (faster)?

  • schedule_via_backend – schedule jobs via the backend instead?

camcops_server.cc_modules.cc_export.export_task(req: CamcopsRequest, recipient: camcops_server.cc_modules.cc_exportrecipient.ExportRecipient, task: camcops_server.cc_modules.cc_task.Task) → None[source]

Exports a single task, checking that it remains valid to do so.

Parameters
camcops_server.cc_modules.cc_export.export_tasks_individually(req: CamcopsRequest, recipient: camcops_server.cc_modules.cc_exportrecipient.ExportRecipient, via_index: bool = True, schedule_via_backend: bool = False) → None[source]

Exports all necessary tasks for a recipient.

Parameters
camcops_server.cc_modules.cc_export.export_whole_database(req: CamcopsRequest, recipient: camcops_server.cc_modules.cc_exportrecipient.ExportRecipient, via_index: bool = True) → None[source]

Exports to a database.

Holds a recipient-specific file lock in the process.

Parameters
camcops_server.cc_modules.cc_export.gen_audited_tasks_by_task_class(collection: TaskCollection, audit_descriptions: List[str]) → Generator[[camcops_server.cc_modules.cc_task.Task, None], None][source]

Generates tasks from a collection, across task classes, simultaneously adding to an audit description. Used for user-triggered downloads.

Parameters
Yields

camcops_server.cc_modules.cc_task.Task objects

camcops_server.cc_modules.cc_export.gen_audited_tasks_for_task_class(collection: TaskCollection, cls: Type[camcops_server.cc_modules.cc_task.Task], audit_descriptions: List[str]) → Generator[[camcops_server.cc_modules.cc_task.Task, None], None][source]

Generates tasks from a collection, for a given task class, simultaneously adding to an audit description. Used for user-triggered downloads.

Parameters
Yields

camcops_server.cc_modules.cc_task.Task objects

camcops_server.cc_modules.cc_export.get_information_schema_query(req: CamcopsRequest) → sqlalchemy.engine.result.ResultProxy[source]

Returns an SQLAlchemy query object that fetches the INFORMATION_SCHEMA.COLUMNS information from our source database.

This is not sensitive; there is no data, just structure/comments.

camcops_server.cc_modules.cc_export.get_information_schema_tsv_page(req: CamcopsRequest, page_name: str = '_camcops_information_schema_columns')camcops_server.cc_modules.cc_tsv.TsvPage[source]

Returns the server database’s INFORMATION_SCHEMA.COLUMNS table as a camcops_server.cc_modules.cc_tsv.TsvPage`.

camcops_server.cc_modules.cc_export.make_exporter(req: CamcopsRequest, collection: TaskCollection, options: camcops_server.cc_modules.cc_export.DownloadOptions)camcops_server.cc_modules.cc_export.TaskCollectionExporter[source]
Parameters
Returns

a BasicTaskCollectionExporter

Raises

HTTPBadRequest

camcops_server.cc_modules.cc_export.print_export_queue(req: CamcopsRequest, recipient_names: List[str] = None, all_recipients: bool = False, via_index: bool = True, pretty: bool = False) → None[source]

Called from the command line.

Shows tasks that would be exported.

Parameters
  • req – a camcops_server.cc_modules.cc_request.CamcopsRequest

  • recipient_names – list of export recipient names (as per the config file)

  • all_recipients – use all recipients?

  • via_index – use the task index (faster)?

  • pretty – use str(task) not repr(task) (prettier, slower because it has to query the patient)

camcops_server.cc_modules.cc_export.write_information_schema_to_dst(req: CamcopsRequest, dst_session: sqlalchemy.orm.session.Session, dest_table_name: str = '_camcops_information_schema_columns') → None[source]

Writes the server’s information schema to a separate database session (which will be an SQLite database being created for download).

There must be no open transactions (i.e. please COMMIT before you call this function), since we need to create a table.