14.2.83. camcops_server.cc_modules.cc_snomed

camcops_server/cc_modules/cc_snomed.py


Copyright (C) 2012-2019 Rudolf Cardinal (rudolf@pobox.com).

This file is part of CamCOPS.

CamCOPS is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

CamCOPS is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with CamCOPS. If not, see <http://www.gnu.org/licenses/>.


CamCOPS code to support SNOMED-CT.

Note that the licensing arrangements for SNOMED-CT mean that the actual codes must be separate (and not part of the CamCOPS code). See the documentation.

A full SNOMED CT download is about 1.1 Gb; see https://digital.nhs.uk/services/terminology-and-classifications/snomed-ct. Within a file such as uk_sct2cl_26.0.2_20181107000001.zip, relevant files include:

# Files with "Amoxicillin" in include two snapshots and two full files:

SnomedCT_UKClinicalRF2_PRODUCTION_20181031T000001Z/Full/Terminology/sct2_Description_Full-en-GB_GB1000000_20181031.txt
# ... 234,755 lines

SnomedCT_InternationalRF2_PRODUCTION_20180731T120000Z/Full/Terminology/sct2_Description_Full-en_INT_20180731.txt
# ... 2,513,953 lines; this is the main file.

Note grammar:

Test basic expressions:

import logging
from cardinal_pythonlib.logs import main_only_quicksetup_rootlogger
from camcops_server.cc_modules.cc_request import get_command_line_request
from camcops_server.cc_modules.cc_snomed import *
from camcops_server.tasks.phq9 import Phq9
main_only_quicksetup_rootlogger(level=logging.DEBUG)
req = get_command_line_request()

# ---------------------------------------------------------------------
# From the SNOMED-CT examples (http://snomed.org/scg), with some values
# fixed from the term browser:
# ---------------------------------------------------------------------

diabetes = SnomedConcept(73211009, "Diabetes mellitus (disorder)")
diabetes_expr = SnomedExpression(diabetes)
print(diabetes_expr.longform)
print(diabetes_expr.shortform)

pain = SnomedConcept(22253000, "Pain (finding)")
finding_site = SnomedConcept(36369800, "Finding site")
foot = SnomedConcept(56459004, "Foot")

pain_in_foot = SnomedExpression(pain, {finding_site: foot})
print(pain_in_foot.longform)
print(pain_in_foot.shortform)

amoxicillin_medicine = SnomedConcept(27658006, "Product containing amoxicillin (medicinal product)")
amoxicillin_substance = SnomedConcept(372687004, "Amoxicillin (substance)")
has_dose_form = SnomedConcept(411116001, "Has manufactured dose form (attribute)")
capsule = SnomedConcept(385049006, "Capsule (basic dose form)")
has_active_ingredient = SnomedConcept(127489000, "Has active ingredient (attribute)")
has_basis_of_strength_substance = SnomedConcept(732943007, "Has basis of strength substance (attribute)")
mass = SnomedConcept(118538004, "Mass, a measure of quantity of matter (property) (qualifier value)")
unit_of_measure = SnomedConcept(767524001, "Unit of measure (qualifier value)")
milligrams = SnomedConcept(258684004, "milligram (qualifier value)")

amoxicillin_500mg_capsule = SnomedExpression(
    amoxicillin_medicine, [
        SnomedAttributeSet({has_dose_form: capsule}),
        SnomedAttributeGroup({
            has_active_ingredient: amoxicillin_substance,
            has_basis_of_strength_substance: SnomedExpression(
                amoxicillin_substance, {
                    mass: 500,
                    unit_of_measure: milligrams,
                }
            ),
        }),
    ]
)
print(amoxicillin_500mg_capsule.longform)
print(amoxicillin_500mg_capsule.shortform)

# ---------------------------------------------------------------------
# Read the XML, etc.
# ---------------------------------------------------------------------

print(VALID_SNOMED_LOOKUPS)
concepts = get_all_snomed_concepts(req.config.snomed_xml_filename)

# ---------------------------------------------------------------------
# Test a task, and loading SNOMED data from XML via the CamCOPS config
# ---------------------------------------------------------------------

phq9 = Phq9()
print("\n".join(str(x) for x in phq9.get_snomed_codes(req)))
phq9.q1 = 2
phq9.q2 = 2
phq9.q3 = 2
phq9.q4 = 2
phq9.q5 = 2
phq9.q6 = 2
phq9.q7 = 2
phq9.q8 = 2
phq9.q9 = 2
phq9.q10 = 2
print("\n".join(repr(x) for x in phq9.get_snomed_codes(req)))
print("\n".join(str(x) for x in phq9.get_snomed_codes(req)))

Note diagnostic coding maps:

Other testing:

camcops_server dev_cli --verbose

from camcops_server.cc_modules.cc_snomed import *

athena_concepts = get_athena_concepts(config.athena_concept_tsv_filename) relationships = get_athena_concept_relationships(config.athena_concept_relationship_tsv_filename) rel_ids = set(r.relationship_id for r in relationships) icd9, icd10 = get_icd9cm_icd10_snomed_concepts(config.athena_concept_tsv_filename, config.athena_concept_relationship_tsv_filename)

ac = get_athena_concepts(config.athena_concept_tsv_filename, vocabulary_ids=[AthenaVocabularyId.SNOMED], concept_codes=[“4303690”])

class camcops_server.cc_modules.cc_snomed.AthenaConceptRelationshipRow(concept_id_1: str, concept_id_2: str, relationship_id: str, valid_start_date: str, valid_end_date: str, invalid_reason: str)[source]

Simple information-holding class for CONCEPT_RELATIONSHIP.csv file from http://athena.ohdsi.org/ vocabulary download.

Argument order is important.

Parameters:
  • concept_id_1 – Athena concept ID #1
  • concept_id_2 – Athena concept ID #2
  • relationship_id – e.g. “Is a”, “Has legal category”
  • valid_start_date – date in YYYYMMDD format
  • valid_end_date – date in YYYYMMDD format
  • invalid_reason – ? (but one can guess)
class camcops_server.cc_modules.cc_snomed.AthenaConceptRow(concept_id: str, concept_name: str, domain_id: str, vocabulary_id: str, concept_class_id: str, standard_concept: str, concept_code: str, valid_start_date: str, valid_end_date: str, invalid_reason: str)[source]

Simple information-holding class for CONCEPT.csv file from http://athena.ohdsi.org/ vocabulary download.

Argument order is important.

Parameters:
  • concept_id – Athena concept ID
  • concept_name – Concept name in the originating system
  • domain_id – e.g. “Observation”, “Condition”
  • vocabulary_id – e.g. “SNOMED”, “ICD10CM”
  • concept_class_id – e.g. “Substance”, “3-char nonbill code”
  • standard_concept – ?; e.g. “S”
  • concept_code – concept code in the vocabulary (e.g. SNOMED-CT concept code like “3578611000001105” if vocabulary_id is “SNOMED”; ICD-10 code like “F32.2” if vocabulary_is is “ICD10CM”; etc.)
  • valid_start_date – date in YYYYMMDD format
  • valid_end_date – date in YYYYMMDD format
  • invalid_reason – ? (but one can guess)
snomed_concept() → camcops_server.cc_modules.cc_snomed.SnomedConcept[source]

Assuming this Athena concept reflects a SnomedConcept, returns it.

(Asserts if it isn’t.)

class camcops_server.cc_modules.cc_snomed.AthenaRelationshipId[source]

Constant-holding class for Athena relationship IDs that we care about.

class camcops_server.cc_modules.cc_snomed.AthenaVocabularyId[source]

Constant-holding class for Athena vocabulary IDs that we care about.

class camcops_server.cc_modules.cc_snomed.SnomedAttribute(name: camcops_server.cc_modules.cc_snomed.SnomedConcept, value: Union[_ForwardRef('SnomedConcept'), _ForwardRef('SnomedExpression'), int, float, str])[source]

Represents a SNOMED-CT attribute, being a name/value pair.

Parameters:
as_string(longform: bool = True) → str[source]

Returns the string form.

Parameters:longform – print SNOMED-CT concepts in long form?
class camcops_server.cc_modules.cc_snomed.SnomedAttributeGroup(attribute_set: Union[typing.Dict[_ForwardRef('SnomedConcept'), typing.Union[_ForwardRef('SnomedConcept'), _ForwardRef('SnomedExpression'), int, float, str]], camcops_server.cc_modules.cc_snomed.SnomedAttributeSet])[source]

Represents a collected group of attribute/value pairs.

Parameters:attribute_set – a SnomedAttributeSet to group
as_string(longform: bool = True) → str[source]

Returns the string form.

Parameters:longform – print SNOMED-CT concepts in long form?
class camcops_server.cc_modules.cc_snomed.SnomedAttributeSet(attributes: Union[typing.Dict[_ForwardRef('SnomedConcept'), typing.Union[_ForwardRef('SnomedConcept'), _ForwardRef('SnomedExpression'), int, float, str]], typing.Iterable[camcops_server.cc_modules.cc_snomed.SnomedAttribute]])[source]

Represents an attribute set.

Parameters:attributes – the attributes
as_string(longform: bool = True) → str[source]

Returns the string form.

Parameters:longform – print SNOMED-CT concepts in long form?
class camcops_server.cc_modules.cc_snomed.SnomedBase[source]

Common functions for SNOMED-CT classes

as_string(longform: bool = True) → str[source]

Returns the string form.

Parameters:longform – print SNOMED-CT concepts in long form?
shortform

Returns the short form, without terms.

xml_element(longform: bool = True) → camcops_server.cc_modules.cc_xml.XmlElement[source]

Returns a camcops_server.cc_modules.cc_xml.XmlElement for this SNOMED-CT object.

Parameters:longform – print SNOMED-CT concepts in long form?
class camcops_server.cc_modules.cc_snomed.SnomedConcept(identifier: int, term: str)[source]

Represents a SNOMED concept with its description (associated term).

Parameters:
  • identifier – SNOMED-CT identifier (code)
  • term – associated term (description)
as_string(longform: bool = True) → str[source]

Returns the string form.

Parameters:longform – print SNOMED-CT concepts in long form?
concept_reference(longform: bool = True) → str[source]

Returns one of the string representations.

Parameters:longform – in long form, with the description (associated term)?
class camcops_server.cc_modules.cc_snomed.SnomedExpression(focus_concept: Union[camcops_server.cc_modules.cc_snomed.SnomedConcept, camcops_server.cc_modules.cc_snomed.SnomedFocusConcept], refinement: Union[camcops_server.cc_modules.cc_snomed.SnomedRefinement, typing.Dict[_ForwardRef('SnomedConcept'), typing.Union[_ForwardRef('SnomedConcept'), _ForwardRef('SnomedExpression'), int, float, str]], typing.List[typing.Union[camcops_server.cc_modules.cc_snomed.SnomedAttributeSet, camcops_server.cc_modules.cc_snomed.SnomedAttributeGroup]]] = None)[source]

An expression containing several SNOMED-CT codes in relationships.

Parameters:
  • focus_concept – the core concept(s); a SnomedFocusConcept
  • refinement – optional additional information; a SnomedRefinement or a dictionary or list that can be converted to one
as_string(longform: bool = True) → str[source]

Returns the string form.

Parameters:longform – print SNOMED-CT concepts in long form?
class camcops_server.cc_modules.cc_snomed.SnomedFocusConcept(concept: Union[camcops_server.cc_modules.cc_snomed.SnomedConcept, typing.Iterable[camcops_server.cc_modules.cc_snomed.SnomedConcept]])[source]

Represents a SNOMED-CT focus concept, which is one or more concepts.

Parameters:concept – the core concept(s); a SnomedCode or an iterable of them
as_string(longform: bool = True) → str[source]

Returns the string form.

Parameters:longform – print SNOMED-CT concepts in long form?
class camcops_server.cc_modules.cc_snomed.SnomedLookup[source]

We’re not allowed to embed SNOMED-CT codes in the CamCOPS code. Therefore, within CamCOPS, we use string constants represented in this class. If the local institution is allowed (e.g. in the UK, as below), then it can install additional data.

Abbreviations:

  • “Finding” is not abbreviated
  • “Obs” or “observable” is short for “observable entity”
  • “Procedure” is not abbreviated
  • “Scale” is short for “assessment scale”
  • “Situation” is not abbreviated

Variable names are designed for clear code. Value strings are designed for clear XML that matches SNOMED-CT, in the format TASK_CONCEPTTYPE_NAME.

class camcops_server.cc_modules.cc_snomed.SnomedRefinement(refinements: Union[typing.Dict[_ForwardRef('SnomedConcept'), typing.Union[_ForwardRef('SnomedConcept'), _ForwardRef('SnomedExpression'), int, float, str]], typing.Iterable[typing.Union[camcops_server.cc_modules.cc_snomed.SnomedAttributeSet, camcops_server.cc_modules.cc_snomed.SnomedAttributeGroup]]])[source]

Implements a SNOMED-CT “refinement”, which is an attribute set +/- some attribute groups.

Parameters:refinements – iterable of SnomedAttributeSet (but only zero or one) and SnomedAttributeGroup objects
as_string(longform: bool = True) → str[source]

Returns the string form.

Parameters:longform – print SNOMED-CT concepts in long form?
class camcops_server.cc_modules.cc_snomed.SnomedValue(value: Union[_ForwardRef('SnomedConcept'), _ForwardRef('SnomedExpression'), int, float, str])[source]

Represents a value: either a concrete value (e.g. int, float, str), or a SNOMED-CT concept/expression.

Implements the grammar elements: attributeValue, expressionValue, stringValue, numericValue, integerValue, decimalValue.

Parameters:value – the value
as_string(longform: bool = True) → str[source]

Returns the string form.

Parameters:longform – print SNOMED-CT concepts in long form?
class camcops_server.cc_modules.cc_snomed.UmlsIcd9SnomedRow(icd_code: str, icd_name: str, is_current_icd: str, ip_usage: str, op_usage: str, avg_usage: str, is_nec: str, snomed_cid: str, snomed_fsn: str, is_one_to_one_map: str, core_usage: str, in_core: str)[source]

Simple information-holding class for a row of the ICD-9-CM TSV file, from https://www.nlm.nih.gov/research/umls/mapping_projects/icd9cm_to_snomedct.html.

NOT CURRENTLY USED.

Argument order is important.

Parameters:
  • icd_code – ICD-9-CM code
  • icd_name – Name of ICD-9-CM entity
  • is_current_icd

    ?

  • ip_usage

    ?

  • op_usage

    ?

  • avg_usage

    ?

  • is_nec

    ?

  • snomed_cid – SNOMED-CT concept ID
  • snomed_fsn – SNOMED-CT fully specified name
  • is_one_to_one_map – ?; possibly always true in this dataset but not true in a broader dataset including things other than 1:1 mappings?
  • core_usage

    ?

  • in_core

    ?

snomed_concept() → camcops_server.cc_modules.cc_snomed.SnomedConcept[source]

Returns the associated SNOMED-CT concept.

class camcops_server.cc_modules.cc_snomed.UmlsSnomedToIcd10Row(row_id: str, effective_time: str, active: str, module_id: str, refset_id: str, referenced_component_id: str, referenced_component_name: str, map_group: str, map_priority: str, map_rule: str, map_advice: str, map_target: str, map_target_name: str, correlation_id: str, map_category_id: str, map_category_name: str)[source]

Simple information-holding class for a row of the ICD-10-CM TSV file from https://www.nlm.nih.gov/research/umls/mapping_projects/snomedct_to_icd10cm.html.

However, that is unhelpful (many to one).

NOT CURRENTLY USED.

Argument order is important.

Parameters:
  • row_id – UUID format or similar
  • effective_time – date in YYYYMMDD format
  • active

    ?

  • module_id

    ?

  • refset_id

    ?

  • referenced_component_id – SNOMED-CT concept ID
  • referenced_component_name – SNOMED-CT concept name
  • map_group – ?; e.g. 1
  • map_priority – ? but e.g. 1, 2; correlates with map_rule
  • map_rule – ?; e.g. “TRUE”; “OTHERWISE TRUE”
  • map_advice – ?, but e.g. “ALWAYS F32.2” or “ALWAYS F32.2 | DESCENDANTS NOT EXHAUSTIVELY MAPPED”
  • map_target – ICD-10 code
  • map_target_name – ICD-10 name
  • correlation_id – a SNOMED-CT concept for the mapping, e.g. 447561005 = “SNOMED CT source code to target map code correlation not specified (foundation metadata concept)”
  • map_category_id – a SNOMED-CT concept for the mapping, e.g. 447637006 = “Map source concept is properly classified (foundation metadata concept)”
  • map_category_name – SNOMED-CT name corresponding to map_category_id, e.g. “MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED”
snomed_concept() → camcops_server.cc_modules.cc_snomed.SnomedConcept[source]

Returns the associated SNOMED-CT concept.

camcops_server.cc_modules.cc_snomed.double_quoted(s: str) → str[source]

Returns a representation of the string argument with double quotes and escaped characters.

Parameters:s – the argument

See:

Test code:

from camcops_server.cc_modules.cc_snomed import double_quoted

def test(s):
    print(f"double_quoted({s!r}) -> {double_quoted(s)}")


test("ab'cd")
test("ab'c"d")
test('ab"cd')
camcops_server.cc_modules.cc_snomed.get_all_icd10_snomed_concepts_from_umls(tsv_filename: str) → Dict[str, camcops_server.cc_modules.cc_snomed.SnomedConcept][source]

Reads in all ICD-10 SNOMED-CT codes that are supported by the client, from the UMLS data file, from https://www.nlm.nih.gov/research/umls/mapping_projects/snomedct_to_icd10cm.html.

Parameters:tsv_filename – TSV filename to read
Returns:maps lookup strings to SnomedConcept objects
Return type:dict

NOT CURRENTLY USED.

camcops_server.cc_modules.cc_snomed.get_all_icd9cm_snomed_concepts_from_umls(tsv_filename: str) → Dict[str, camcops_server.cc_modules.cc_snomed.SnomedConcept][source]

Reads in all ICD-9-CM SNOMED-CT codes that are supported by the client, from the UMLS data file, from https://www.nlm.nih.gov/research/umls/mapping_projects/icd9cm_to_snomedct.html.

Parameters:tsv_filename – TSV filename to read
Returns:maps lookup strings to SnomedConcept objects
Return type:dict

NOT CURRENTLY USED.

camcops_server.cc_modules.cc_snomed.get_all_task_snomed_concepts(xml_filename: str) → Dict[str, camcops_server.cc_modules.cc_snomed.SnomedConcept][source]

Reads in all SNOMED-CT codes for CamCOPS tasks, from the custom CamCOPS XML file for this.

Parameters:xml_filename – XML filename to read
Returns:maps lookup strings to SnomedConcept objects
Return type:dict
camcops_server.cc_modules.cc_snomed.get_athena_concept_relationships(tsv_filename: str, concept_id_1_values: Collection[int] = None, concept_id_2_values: Collection[int] = None, relationship_id_values: Collection[str] = None) → List[camcops_server.cc_modules.cc_snomed.AthenaConceptRelationshipRow][source]

From the Athena CONCEPT_RELATIONSHIP.csv tab-separated value file, return a list of relationships matching the restriction criteria.

Parameters:
  • tsv_filename – filename
  • concept_id_1_values – permissible concept_id_1 values, or None or an empty list for all
  • concept_id_2_values – permissible concept_id_2 values, or None or an empty list for all
  • relationship_id_values – permissible relationship_id values, or None or an empty list for all
Returns:

of AthenaConceptRelationshipRow objects

Return type:

list

camcops_server.cc_modules.cc_snomed.get_athena_concepts(tsv_filename: str, vocabulary_ids: Collection[str] = None, concept_codes: Collection[str] = None, concept_ids: Collection[int] = None) → List[camcops_server.cc_modules.cc_snomed.AthenaConceptRow][source]

From the Athena CONCEPT.csv tab-separated value file, return a list of concepts matching the restriction criteria.

Parameters:
  • tsv_filename – filename
  • vocabulary_ids – permissible vocabulary_id values, or None or an empty list for all
  • concept_codes – permissible concept_code values, or None or an empty list for all
  • concept_ids – permissible concept_id values, or None or an empty list for all
Returns:

of AthenaConceptRow objects

Return type:

list

camcops_server.cc_modules.cc_snomed.get_icd10_snomed_concepts_from_xml(xml_filename: str) → Dict[str, List[camcops_server.cc_modules.cc_snomed.SnomedConcept]][source]

Reads in all ICD-10 SNOMED-CT codes from a custom CamCOPS XML file.

Parameters:xml_filename – filename to read
Returns:maps ICD-10 codes to lists of SnomedConcept objects
Return type:dict
camcops_server.cc_modules.cc_snomed.get_icd9_snomed_concepts_from_xml(xml_filename: str) → Dict[str, List[camcops_server.cc_modules.cc_snomed.SnomedConcept]][source]

Reads in all ICD-9-CM SNOMED-CT codes from a custom CamCOPS XML file.

Parameters:xml_filename – filename to read
Returns:maps ICD-9-CM codes to lists of SnomedConcept objects
Return type:dict
camcops_server.cc_modules.cc_snomed.get_icd9cm_icd10_snomed_concepts_from_athena(athena_concept_tsv_filename: str, athena_concept_relationship_tsv_filename: str) → Tuple[Dict[str, List[camcops_server.cc_modules.cc_snomed.SnomedConcept]], Dict[str, List[camcops_server.cc_modules.cc_snomed.SnomedConcept]]][source]

Takes Athena concept and concept-relationship files, and fetches details of SNOMED-CT code for all ICD-9-CM and ICD-10[-CM] codes used by CamCOPS.

A bit of human review is required; this is probably preferable to using gensim or some other automatic similarity check.

Parameters:
  • athena_concept_tsv_filename – path to CONCEPT.csv (a tab-separated value file)
  • athena_concept_relationship_tsv_filename – path to CONCEPT_RELATIONSHIP.csv (a tab-separated value file)
Returns:

icd9cm, icd10, where each is a dictionary mapping ICD codes to a list of mapped SnomedConcept objects.

Return type:

tuple

camcops_server.cc_modules.cc_snomed.get_multiple_snomed_concepts_from_xml(xml_filename: str, valid_lookups: Set[str] = None, require_all: bool = False) → Dict[str, List[camcops_server.cc_modules.cc_snomed.SnomedConcept]][source]

Reads in all SNOMED-CT codes for ICD-9 or ICD-10, from the custom CamCOPS XML file for this (made by e.g. send_athena_icd_snomed_to_xml()).

Parameters:
  • xml_filename – XML filename to read
  • valid_lookups – possible lookup values
  • require_all – require that valid_lookups is truthy and that all values in it are present in the XML
Returns:

maps lookup strings to lists of SnomedConcept objects

Return type:

dict

camcops_server.cc_modules.cc_snomed.get_snomed_concepts_from_xml(xml_filename: str) → Dict[str, Union[camcops_server.cc_modules.cc_snomed.SnomedConcept, typing.List[camcops_server.cc_modules.cc_snomed.SnomedConcept]]][source]

Reads in all SNOMED-CT concepts from an XML file according to the CamCOPS format.

Parameters:xml_filename – XML filename to read
Returns:mapping each lookup code found to a list of SnomedConcept objects
Return type:dict
camcops_server.cc_modules.cc_snomed.send_athena_icd_snomed_to_xml(athena_concept_tsv_filename: str, athena_concept_relationship_tsv_filename: str, icd9_xml_filename: str, icd10_xml_filename) → None[source]

Reads SNOMED-CT codes for ICD-9-CM and ICD10 from Athena OHDSI files, and writes

Parameters:
  • athena_concept_tsv_filename – path to CONCEPT.csv (a tab-separated value file)
  • athena_concept_relationship_tsv_filename – path to CONCEPT_RELATIONSHIP.csv (a tab-separated value file)
  • icd9_xml_filename – ICD-9 XML filename to write
  • icd10_xml_filename – ICD-10 XML filename to write
camcops_server.cc_modules.cc_snomed.write_snomed_concepts_to_xml(xml_filename: str, concepts: Dict[str, List[camcops_server.cc_modules.cc_snomed.SnomedConcept]], comment: str = 'Autogenerated XML (see camcops_server.cc_modules.cc_snomed.py); do not edit') → None[source]

Writes SNOMED-CT concepts to an XML file in the CamCOPS format.

Parameters:
  • xml_filename – XML filename to write
  • concepts – dictionary mapping lookup codes to a list of SnomedConcept objects
  • comment – comment for XML file