The Yale Guideline Recommendation Corpus:

A Representative Sample of the Knowledge Content of Guidelines

Authors:

Tamseela Hussain, MD

George Michel, MS

Richard N. Shiffman, MD, MCIS

Affiliation:

Yale Center for Medical Informatics

Yale University School of Medicine

New Haven, CT

Correspondence:

Tamseela Hussain

Fellow in Biomedical Informatics

Yale Center for Medical Informatics

300 George Street Suite 501

New Haven, CT 06511

Phone: (203)-737-6091

Fax: (203) - 737-5708

tamseela.hussain@yale.edu

Abstract:

Objective: To develop and characterize a large, representative sample of guideline recommendations that can be used to better understand how current recommendations are written and to test the adequacy of guideline models. We refer to this sample as the Yale Guideline Recommendation Corpus (YGRC).

Method: To develop the YGRC, we extracted recommendations from guidelines downloaded from the National Guideline Clearinghouse (NGC). We evaluated the representativeness of the YGRC by comparing the frequency of use of controlled vocabulary terms in the YGRC sample and in the NGC. We examined semantic and formatting indicators that were used to denote recommendation statements.

Results: In the course of reviewing 7527 recommendation statements, we extracted 1275 recommendations from the NGC and characterized the guidelines from which they were derived. Both semantic and formatting indicators were used inconsistently to denote recommendations. Recommendation statements were not reliably identifiable in 31.6% (310/982) of the guidelines and many recommendations were not actionable as written. We also found variability and inconsistency in the way strength of recommendation is currently reported. Over half of the recommendations (52.7%), did not indicate strength, while 6.6% inaccurately indicated strength.

Conclusion: The YGRC provides a representative sample of current guideline recommendations and demonstrates considerable variability and inconsistency in the way recommendations are written and in the way the recommendation strength is currently reported.

Key words: Guidelines, Recommendations, National Guideline Clearinghouse, Strength

I. INTRODUCTION

Clinical practice guidelines are intended to directly improve the processes of health care and ultimately to improve the outcomes experienced by patients. Guidelines that are evidence-based aid in optimizing clinical decision making by suggesting a course of action based on conscientious, explicit and judicious use of current best evidence about the care of individual patients(1). Guidelines vary greatly in terms of both their method of development and the utility of the finished products.

Clinical guidelines contain recommendation statements that define appropriate care and, in so doing, differentiate guidelines from other publications such as systematic reviews. Most recommendations consist of relatively straightforward declarative statements that advocate a particular clinical practice. Ideally, each recommendation should describe precisely the nature of the proposed actions as well as the exact circumstances under which the actions should be undertaken (2). Specific, concrete recommendation statements are more likely to be understood, remembered, and acted upon, and can serve as a basis for the development of benchmarks or performance indicators. Presenting evidence and recommendations in a clear, concise, and accessible manner facilitates the retrieval and assimilation of specific information (3). Yet many guidelines include vague and seriously underspecified recommendations that make implementation difficult (4)(5).

Users of guidelines need to know how to apply the knowledge contained in guidelines effectively and how much confidence to place in the recommendations. This information is most often conveyed by categorizing the quality of the body of evidence on which each recommendation is based. Quality of evidence is defined as the extent to which one can be confident that an estimate of effect is correct (6). In addition to the quality of evidence, many guideline developers have also recognized the critical importance of weighing the benefits that may be anticipated when a recommendation is followed against any expected risks, harms, and costs (7). This judgment is referred to most often as the Recommendation Strength. Recommendation strength translates into an expectation of level of adherence. Guideline authors at several sites, including the American Academy of Pediatrics, the American Academy of Otolaryngology-Head and Neck Surgeons, the American Thoracic Society, and the American College of Chest Physicians, explicitly consider and report recommendation strength (7-10).

The application of the concept of recommendations strength in guidelines has not been examined systematically. Previous studies that addressed strength of recommendation have done so on small, non-representative samples of recommendations and have discussed the need of a uniform system, or advocated their own system such as GRADE, SORT the modification of GRADE used by the American College of Chest Physicians, etc (10-12).

Previous studies in modeling guideline recommendations for implementation in computer-based decision support systems have often relied on small numbers of recommendations selected from limited, convenient samples of guidelines (see Table 1). We believe such studies may result in knowledge models that fit the selected recommendations well, but may fail to effectively represent large numbers of guideline recommendations.

Insert Table 1: Guidelines Modeled in Several Current Guideline Representations

The primary objective of this work is to develop and characterize a large, representative sample of guideline recommendations that can be used to better understand how current recommendations are written. We refer to this sample as the Yale Guideline Recommendation Corpus (YGRC). In the following sections, we describe the process of YGRC development, characteristics of the guidelines from which the recommendations are derived, the difficulties we encountered in identifying and extracting recommendation statements from guideline text, and use of the corpus to describe the prevalence of recommendation strength statements.

II. METHODS

II.A. Guideline Selection

To initiate the development of a representative sample of guideline recommendations, we downloaded all 1,964 guideline summaries available at the Agency for Healthcare Research and Qualitys (AHRQ) National Guideline Clearinghouse website (NGC) on June 15^th 2007. The NGC provides a comprehensive, web-accessible database of summaries of evidence-based clinical practice guidelines and related documents. These summaries are prepared for AHRQ by ECRI, a contractor organization that develops the summaries according to a set of carefully prescribed protocols in consultation with the organizations that authored the guidelines.

To be included in the NGC:

Guidelines must be produced under the auspices of medical specialty associations, professional societies, public or private organizations, government agencies, health care organizations, or plans.
Guideline development must include a systematic literature search and review of scientific evidence.
The full text of the guideline must be available in English.
Guidelines must have been developed, reviewed, or revised within the past 5 years.

Each NGC summary is an XML document that accommodates text content in up to 55 elements per guideline. Thirteen of these elements utilize controlled vocabularies of terms to classify guideline attributes. Coding of elements is not required and the concepts are not mutually exclusive so each guideline may contain one, several, or no controlled vocabulary terms in each field. We used the controlled vocabulary terms to characterize subsets of guidelines we selected from the NGC.

Insert Figure 1. Method of the Development of the YGRC

Our goal was to include at least 1000 recommendation statements that were broadly representative of all the currently available guideline recommendations at the NGC website. The method by which the Yale Guideline Recommendation Corpus was developed from the NGC is summarized in Figure 1. We numbered each guideline summary sequentially and selected those with odd-numbered identifiers to achieve a representative sample of guidelines (N= 982).

II.B. Selection of Recommendations

In each NGC guideline summary, all recommendations are aggregated within a single XML element entitled Major Recommendations. They are accompanied by highly variable text that describes, for example, background information about the disease or condition to which the recommendation applies, the rationale for the recommendation, and information that amplifies how the recommendation might be carried out. We attempted to identify statements that were recognizable as individual recommendations within the text of this element. We operationally defined a recommendation as a statement whose apparent intent is to provide guidance about the advisability of a clinical action. Recommendations were identified based on semantic considerations, formatting (such as bullets, bolded text, and enumeration), headers (that include descriptors such as recommended) and presence of recommendation strength indicators. Guidelines were next reviewed for consistency of presentation. If recommendations were not consistently recognizable throughout a given guideline, then the guideline was set aside for further review (see below).

We then counted the total number of recommendations in each eligible guideline and, using a random number generator, selected 3 recommendations from each. Random selection was necessary to avoid an order bias, because recommendations are often organized in a sequence in which the first statements address screening and diagnostic considerations and latter recommendations address management considerations. To avoid oversampling, guidelines that contained fewer than 3 recommendations were excluded. Guidelines with more than 100 recommendations were also excluded from the sampling because consistent counting of large numbers of recommendations was not feasible. We excluded those guidelines that represented recommendations in algorithmic (flowchart) format because we planned to focus on declarative rather than procedural knowledge. We also excluded those guidelines that provided recommendations in tables because the tabular format encodes meta-information that would be difficult to capture consistently in a corpus of textual statements. Finally, the text of each recommendation was extracted and entered into a MySQL database along with the source guidelines title and the developers name to create the YGRC.

For each recommendation entered into the MySQL database, we also collected information regarding the indication of strength of recommendations and categorized recommendation strength indicators as:

Present, i.e. the strength of this recommendation incorporated an appraisal of benefits, harms, risks and costs.

Absent, i.e. the strength of this recommendation was not indicated.

Strength Inaccurately Indicated, i.e. some information is indicated as recommendation strength that, in fact, merely describes evidence quality.

For the 315 guideline summaries in which we were unable to reliably identify recommendation statements, we reviewed the original guideline statements from their sources to assure that the NGC summaries were congruent with the original publications In 5 of 315 guidelines (1.6%) that we excluded because recommendations were not consistently identifiable, the original publications provided text or formatting that made identification of the recommendation statements possible. Three of these guidelines were excluded from the YGRC pool because the recommendations were presented in tabular format or they contained fewer than 3 recommendations. The remaining 2 guidelines were added to the pool of YGRC source guidelines for randomization. That left a set of 310 guidelines whose recommendations were not consistently identifiable.

III. RESULTS

III.A. Characteristics of NGC and YGRC Guidelines

As shown in Table 2, most guidelines included in the NGC are developed by medical specialty societies (39.3%) and professional associations (15.8%). Non-US governmental agencies account for 14.1%, while US governmental contributions account for 9.9%.

Insert Table 2: Sources of guideline developers contributing to the 1964 guidelines at National Guidelines Clearinghouse.

Most guidelines were coded to indicate that they provide advice about treatment and management (See Table 3). Advice regarding evaluation was available in almost half of guidelines. Diagnostic assistance and advice regarding prevention were provided by about 1/3 of guidelines.

Insert Table 3: Application of Category Codes in the full NGC (1964 guidelines) and in the YGRC sample (425 guidelines).

We found 425 guidelines that met eligibility criteria for inclusion in the YGRC. From these guidelines we identified and enumerated 7527 recommendation statements and randomly selected 1275 recommendations from them. These recommendations cover a broad range of diseases and mental disorders.

To assure that the YGRC sample reflected the NGC content, we compared the proportion of YGRC guidelines coded with Guideline Category controlled vocabulary terms, with the proportion of NGC guidelines that were coded with these terms (see Table 3). Because the NGC and YGRC were similar in rankings and percentages of Guideline Category code application (the frequency of Screening and Assessment of Therapeutic Effectiveness differed slightly), we concluded that the YGRC subset was representative of the NGC with respect to guideline categories.

III.B. Identification of Recommendation Statements

To facilitate consistent identification, extraction, and counting we defined several indicators that we used to recognize recommendations. These indicators include:

III.B.1. Semantic indicators

1. Recommendations may include (a) modal operators (e.g., terms such as should, must, may) to express a level of obligation or permission or (b) statements of suitability under specific circumstances (e.g., is appropriate, is indicated).

Example:

An F 18-deoxyglucose positron emission tomography (FDG_PET) scan should be performed to investigate solitary pulmonary nodules in cases where a biopsy is not possible or has failed, depending on nodule size, position and CT characterization (13).

2. Indicative headings and titles, such as Recommendations and Recommended may be used to demarcate recommendation statements.

Example:

Recommendation: Treat duodenal ulcers with H2RAs or PPIs for 4 to 8 weeks (14).

3. Guidance may be presented in concise paragraphs.

In the most recognizable format, as shown in the following example, the first (topic) sentence of the paragraph explicitly states an advisable course of action, although other formatting indicators are absent. The information that follows amplifies and explains the recommended activity.

Example:

Beginning in their 20s, women should be told about the benefits and limitations of breast self examination (BSE).The importance of prompt reporting of any new breast symptoms to a health professional should be emphasized. Women who choose to do BSE should receive instruction and have their technique reviewed on the occasion of a periodic health examination (15).

4. Recommendation statements may be accompanied by an indicator of evidence quality or strength of recommendation.

Following is an example from the YGRC in which strength of recommendation and quality of evidence is indicated:

Example:

Rituximab is active in the treatment of Wm but associated with the risk of transient exacerbations of clinical effects of the disease and should only be used with caution, especially in patients with symptoms of hyper-viscosity and/or IgM levels >40 g/L. Level of evidence IIb, Grade of Recommendation B (16).

III.B.2. Several formatting indicators may be used to facilitate recognition of recommendations.

1. Enumeration of statements.

2. Boldface text.

3. Bulleted text

III.C. Characteristics of recommendations that were not easily identifiable

We found several recurring issues that interfered with our ability to reliably and consistently identify recommendation statements within guideline text.

III.C.1. Clinical facts were formatted as recommendations.

Many statements, which were formatted like recommendations, were simply facts and were not actionable as written.

Example:

Suppressive therapy is effective for preventing recurrent infections. (Strength of Recommendation A-1) (17).

In this example, the statement indicates that the agent is effective and it is even supported by an indicator of strength of recommendation. However, the factual assertion does not indicate whether or under what conditions suppressive therapy should be used. A decision about whether or not to use the therapy is dependent on other unspecified considerations, such as its comparative effectiveness (vis--vis no therapy or other agents), its safety profile, and its cost. As written, this recommendation is not executable.

III.C.2. Guidance was deeply embedded in paragraphs.

Some statements that were actionable and provided guidance about the advisability of a clinical action were embedded in long paragraphs, without any formatting to indicate that the statements were intended to serve as recommendations. Identifying such statements as recommendations required a thorough reading and understanding of the text.

Example:

A pilot open-label study suggested that paroxetine is effective in reducing pain and other IBS symptoms. A literature search revealed only one randomized controlled trial (RCT) examining the use of an SSRI (paroxetine) for treatment of IBS. This trial did suggest an improvement in overall well-being in both depressed and non-depressed individuals with IBS. Given the limited evidence, their use is not recommended as routine or first-line therapy except in patients who also have co-morbid depression (18) .

In this example, the paragraph begins with the suggestion that a drug is effective and only at the end does the reader learn that its use is not recommended.

III.C.3. Formatting of recommendations was inconsistent within the same guideline.

We noted many instances where the same formatting (e.g., bullets, boldface) was used to denote recommendations in one part of a guideline and used for other purposes or not used with other recommendations within the same guideline. This inconsistency complicated the identification of text that was intended by the guideline developers to serve as recommendations.

The following example shows inconsistent use of bullets to denote recommendations:

Example:

Oral antiviral drugs are indicated within 5 days of the start of the episode and while new lesions are still forming. (A recommendation)

Topical agents are less effective than oral agents. (An assertion of fact)

Acyclovir, valaciclovir, and famciclovir all reduce the severity and duration of episodes. Antiviral therapy does not alter natural history of the disease (19). (Two additional assertions of fact)

Such inconsistencies made it difficult in many cases to identify, count, and extract recommendations from within the guideline text.

III.D. Indication of strength of recommendation was variable and inconsistent in guidelines.

Table 4 shows the variability and inconsistency that we found, in the way strength of recommendation is currently being reported in guidelines.

Insert Table 4: Results: Documentation of Strength of Recommendations in the YGRC

Variability exists, because rather than a uniform system, multiple different methods were used by guideline developers for demarcating strength of recommendation, e.g. alphabet characters [e.g., grades A, B, C, D] Arabic numbered levels [1, 2 ,3, 4], and Roman numerals [I, II ,III, IV] (11). Inconsistency exists, because the strength of recommendation was not always indicated consistently. Some recommendation statements had a notation of strength of recommendation, whereas other statements in the same guideline did not. Following is an example from the YGRC in which only one of the two contiguous recommendations has an indication of strength:

Example:

Diet

Dietary modifications alone, such as a clear liquid diet, are inadequate for colonoscopy. However they have proven to be a beneficial adjunct to other mechanical cleansing methods (Grade IIB).

Enemas

Use enemas in patients who present to endoscopy with a poor distal colon preparation and in patients with a defunctionalized distal colon (20).

III.E. Comparison of YGRC and Sub optimally Formatted Guidelines

We excluded 310 guidelines from the YGRC because we were unable to consistently identify recommendations within them. We hypothesized that this subset of guidelines might be characterized by additional weaknesses in guideline development. We therefore compared these guidelines with the guidelines from which the YGRC was derived using variables that are encoded using the NGCs controlled vocabulary.

We found that guidelines that were coded as having been produced by US federal government agencies were disproportionately heavily represented (10.3% vs. 2.6%, X²=19.48, P <0.00001) within the subset of guidelines with inconsistently identifiable recommendations. Contrariwise, guidelines produced by non-US national government agencies were disproportionately poorly represented (1.3% vs. 10.3%, X²= 22.63, P=0.000002) among the excluded guidelines.

Several of the NGC controlled vocabularys authorized terms pertain to the methodology of guideline development. The percentage of guidelines that utilized systematic review and meta-analysis to analyze evidencehigh-quality, transparent approaches to evidence appraisalwas higher in the YGRC than in the set of guidelines whose recommendations were not consistently identifiable (See Figure 2).

Insert Figure 2: Methods Used to Analyze Evidence

Similarly, the methods used to assess the quality and strength of evidence for guidelines in the YGRC weighted recommendations according to an explicitly stated rating scheme significantly more often than did the guidelines whose recommendations were not consistently identifiable (see Figure 3). Those guidelines whose recommendations were not consistently identifiable (1) more frequently depended on subjective reviews and (2) failed to state whether or not a rating scheme was applied, or (3) if a rating scheme was applied, it was not supplied in the guideline.

Insert Figure 3: Methods used to Assess Quality and Strength of Evidence

IV. DISCUSSION

We developed a corpus of 1275 randomly selected recommendation statements from the National Guidelines Clearinghouse and characterized the guidelines from which they were derived. We found considerable variability and inconsistency in the way guideline recommendations are currently written and reported. These deficiencies were serious enough to imperil the very identification of the statements that were intended to be clinical recommendations and thus influence clinical practice.

Guideline authors currently use both semantic and formatting indicators to signify recommendations. However, we noted that inconsistent formatting was prevalent in this guideline sample. Moreover, in many cases guideline authors used declarative statements of clinical facts in place of actionable recommendations. Such statements fail to convey critical details that are necessary to apply the knowledge in the course of clinical care. Guidelines that included sub optimally formatted recommendations were also deficient in other quality indicators such as methodology of evidence review and use of a transparent rating scheme for evidence quality and recommendation strength.

To influence patient outcomes, systems must be devised that promote adherence to guideline recommendations by targeted clinicians. Grol et al. related adherence rates to12 attributes of guidelines, including whether or not the recommendations were concrete, evidencebased, or controversial (21). Adherence rates almost doubled when recommendations were clear and precise when compared with recommendations judged to be vague and non-specific. Likewise, Shekelle et al. found that nonspecific guideline statements actually decrease appropriate test-ordering behavior (22).

This study also demonstrates that the strength of recommendation is currently applied infrequently, variably, and inconsistently in guideline recommendations (23). Slightly more than half of the recommendations (52.7%) did not indicate strength of recommendation and 6.5% inaccurately stated strength of recommendation, where the strength, as stated, purported an indication of evidence quality. Quality of evidence determines the extent to which one can be confident that an estimate of effect is correct. The strength of recommendation describes the extent to which one can be confident that adherence to the recommendation will do more good than harm. Because we used the presence of recommendation strength as a determinant of identifiable recommendations, we presume that the estimate of underspecification would be even higher in the NGC as a whole.

The process of transformation of guideline-based knowledge into effective decision support systems remains an informatics challenge. Application of the concept of recommendation strength is also critical in the design of clinical decision support systems that deliver patient-specific adviceat the point of care. Strong recommendations can be operationalized in systems that require adherence before allowing the user to move on. Lower level recommendations can promote appropriate practice by offering a default choice or by simply facilitating documentation.

We believe that an ideal recommendation explicitly or implicitly answers the questions:

WHO should do WHAT to WHOM, UNDER WHAT CIRCUMSTANCES, HOW, and WHY? (24).

Vague and underspecified recommendations violate the principle of clarity set forth by the Institute of Medicine in 1992 that guideline recommendations must use unambiguous language, define terms precisely, and use logical and easy-to-follow modes of presentation (25).

Guideline recommendations would be clearer and more acceptable to users if authors and publishers adhered to the following recommendations:

1. Identify the critical recommendations in guideline text using semantic indicators (such as The Committee recommends or Whenever X, Y, and Z occur clinicians should) and formatting (e.g., bullets, enumeration, and boldface text).

2. Use consistent semantic and formatting indicators throughout the publication.

3. Group recommendations together in a summary section to facilitate their identification.

4. Do not use assertions of fact as recommendations. Recommendations must be decidable and actionable.

5. Avoid embedding recommendation text deep within long paragraphs. Ideally, recommendations should be stated in the first (topic) sentence of the paragraph and the remainder of the paragraph can be used to amplify the suggested guidance.

6. Clearly and consistently assign evidence quality and recommendation strength in proximity to each recommendation and distinguish between the distinct concepts of quality of evidence and strength of recommendation.

Limitations

The National Guideline Clearinghouse may not be a representative source of the universe of guideline documents because it is limited to English-language guidelines and includes mostly guidelines created in North America. However, it is a rich source of guideline knowledge with reasonable standards for inclusion. In addition, it is widely used and is highly accessible.

We excluded a large number of guidelines from our sampling process because they included fewer than 3 or more than 100 recommendations, or presented recommendations in a tabular or algorithmic format. This might be expected to diminish the representativeness of the sample. Nonetheless, we retained a large number of guidelines that we demonstrated were representative of the NGC content.

Some formatting inconsistency may be introduced at the time guideline summaries are created. However, upon review of the 315 original guideline publications that were initially excluded from the YGRC because of formatting deficiencies, only 5 (1.6%) were found that included identifiable recommendations.

Future Work

We plan to use the YGRC as a resource for exploring the knowledge content of guidelines and to investigate and clarify problems in guideline authoring and dissemination. For example, we have completed a study of the YGRC to ascertain current patterns in the use of statements of Recommendation Strength, a parameter that is of critical importance to guideline implementers and end-users (23).

Additional planned studies involve the use of manual and natural language processing techniques to characterize the language of recommendations and to define exemplary patterns of statement clarity. The YGRC can also be used to test the adequacy of vocabularies to encode concepts necessary for guideline implementation in computer-mediated decision support systems. Although considerable work has highlighted the capacity of current vocabularies to represent patient data necessary for decision support, little work has been done to examine the fitness of these systems for expressing the knowledge-based concepts that must be represented.

Acknowledgements

This work was supported by grant LM07199, which is co-funded by the National Library of Medicine and the Agency for Healthcare Research and Quality, and by grant T15-LM07065 from the National Library of Medicine.

References

(1) Sackett DL, Rosenberg WM, Gray JA, Haynes RB, Richardson WS. Evidence based medicine: what it is and what it isn't. BMJ 1996 Jan 13;312(7023):71-72.

(2) Shiffman RN, Shekelle P, Overhage JM, Slutsky J, Grimshaw J, Deshpande AM. Standardized reporting of clinical practice guidelines: a proposal from the Conference on Guideline Standardization. Ann.Intern.Med. 2003 Sep 16;139(6):493-498.

(3) Michie S, Lester K. Words matter: increasing the implementation of clinical guidelines. Qual.Saf.Health.Care. 2005 Oct;14(5):367-370.

(4) Codish S, Shiffman RN. A model of ambiguity and vagueness in clinical practice guideline recommendations. AMIA.Annu.Symp.Proc. 2005:146-150.

(5) Tierney WM, Overhage JM, Takesue BY, Harris LE, Murray MD, Vargo DL, et al. Computerizing guidelines to improve care and patient outcomes: the example of heart failure. J.Am.Med.Inform.Assoc. 1995 Sep-Oct;2(5):316-322.

(6) Atkins D, Best D, Briss PA, Eccles M, Falck-Ytter Y, Flottorp S, et al. Grading quality of evidence and strength of recommendations. BMJ 2004 Jun 19;328(7454):1490.

(7) Schunemann HJ, Jaeschke R, Cook DJ, Bria WF, El-Solh AA, Ernst A, et al. An official ATS statement: grading the quality of evidence and strength of recommendations in ATS guidelines and recommendations. Am.J.Respir.Crit.Care Med. 2006 Sep 1;174(5):605-614.

(8) American Academy of Pediatrics Steering Committee on Quality Improvement and Management. Classifying recommendations for clinical practice guidelines. Pediatrics 2004 Sep;114(3):874-877.

(9) Rosenfeld RM, Shiffman RN. Clinical practice guidelines: a manual for developing evidence-based guidelines to facilitate performance measurement and quality improvement. Otolaryngol.Head.Neck.Surg. 2006 Oct;135(4 Suppl):S1-28.

(10) Guyatt G, Gutterman D, Baumann MH, Addrizzo-Harris D, Hylek EM, Phillips B, et al. Grading strength of recommendations and quality of evidence in clinical guidelines: report from an american college of chest physicians task force. Chest 2006 Jan;129(1):174-181.

(11) Grading Recommendations Assessment, Development and Evaluation Working Group. GRADE. 2007; Available at: http://www.gradeworkinggroup.org/index.htm. Accessed August 8, 2007, 2007.

(12) Ebell MH, Siwek J, Weiss BD, Woolf SH, Susman JL, Ewigman B, et al. Simplifying the language of evidence to improve patient care: Strength of recommendation taxonomy (SORT): a patient-centered approach to grading evidence in medical literature. J.Fam.Pract. 2004 Feb;53(2):111-120.

(13) National Collaborating Center for Acute Care. The diagnosis and treatment of lung cancer. Available at: www.ngc.gov.

(14) New Zealand Guidelines Group. Management of dyspepsia and heartburn. Available at: www.ngc.gov.

(15) American Cancer Society. ACS guidelines for breast cancer screening. Available at: www.ngc.gov.

(16) British Committee for Standards in Hematology. Guidelines on the management of Waldenstroms macroglobulinemia. Available at: www.ngc.gov.

(17) Infectious Diseases Society of America. Guidelines for treatment of candidiasis. Available at: www.ngc.gov.

(18) University of Texas at Austin, School of Nursing. The efficacy of antidepressants and various psychotherapies as adjunctive treatments for irritable bowel syndrome. Available at: www.ngc.gov.

(19) British Association of Sexual Health and HIV Medical Society. 2002 National guideline for the management of genital herpes. Available at: www.ngc.gov.

(20) American Society for Gastrointestinal Endoscopy, Society of American Gastrointestinal and Endoscopic Surgeons. A consensus document on bowel preparation before colonoscopy. Available at: www.ngc.gov.

(21) Grol R, Dalhuijsen J, Thomas S, Veld C, Rutten G, Mokkink H. Attributes of clinical guidelines that influence use of guidelines in general practice: observational study. BMJ 1998 Sep 26;317(7162):858-861.

(22) Shekelle PG, Kravitz RL, Beart J, Marger M, Wang M, Lee M. Are nonspecific practice guidelines potentially harmful? A randomized comparison of the effect of nonspecific versus specific guidelines on physician decision making. Health Serv.Res. 2000 Mar;34(7):1429-1448.

(23) How often is strength of recommendation indicated in guidelines: A study of the Yale Guideline Recommendation Corpus. American Medical Informatics Association Annual Meeting; October, 2008; ; 2008.

(24) Shiffman RN, Michel G, Essaihi A, Thornquist E. Bridging the guideline implementation gap: a systematic, document-centered approach to guideline implementation. J.Am.Med.Inform.Assoc. 2004 Sep-Oct;11(5):418-426.

(25) Institute of Medicine (U.S.). Committee on Clinical Practice Guidelines,. Guidelines for Clinical Practice: From Development to Use. : National Academy Press; 1992.

(26) Tu SW, Campbell JR, Glasgow J, Nyman MA, McClure R, McClay J, et al. The SAGE Guideline Model: achievements and overview. J.Am.Med.Inform.Assoc. 2007 Sep-Oct;14(5):589-598.

(27) Ohno-Machado L, Gennari JH, Murphy SN, Jain NL, Tu SW, Oliver DE, et al. The guideline interchange format: a model for representing guidelines. J.Am.Med.Inform.Assoc. 1998 Jul-Aug;5(4):357-372.

(28) Shiffman RN, Karras BT, Agrawal A, Chen R, Marenco L, Nath S. GEM: a proposal for a more comprehensive guideline document model using XML. J.Am.Med.Inform.Assoc. 2000 Sep-Oct;7(5):488-498.

(29) Protocure. Protocure II. 2007; Available at: http://www.protocure.org/. Accessed July 18, 2007.

Table 1: Guidelines Modeled in Several Current Representation Systems

Guideline Representation System

Guidelines Used for Development/Testing

SAGE (Standards based, Sharable Active Guideline Environment) (26)

4 Guidelines

Immunization (CDC); Diabetes (Standards of Medical Care in Diabetes 2006); Diabetic hypertension (7^th JNC Report); Community-acquired pneumonia (Infectious Disease Society of America)

GLIF (Guideline Interchange Format) (27)

4 Guidelines

Breast mass workup (Borton M. Gynecological Decision Making Decker 1988); Breast cancer treatment (Eastern Cooperative oncology Group) ; Cholesterol management (NCEP); Influenza vaccine (CDC)

GEM (Guideline Element Module) (28)

5 Guidelines

Urinary tract infection; Febrile seizures; Developmental dysplasia of the hip; Asthma exacerbations; Attention deficit disorder (American Academy of Pediatrics)

Protocure (29)

2 Guidelines

Jaundice (American Academy of Pediatrics); Diabetes (source not listed)

Table 2. Sources of guideline developers contributing to the 1964 guidelines at National Guidelines Clearinghouse. More than one guideline developer may contribute to each guideline.

39.3%	Medical Specialty Society
15.8%	Professional Association
8.3%	National Government Agency \Non-U.S
6.9%	Federal Government Agency \US
6.7%	Private Nonprofit Organization
5.8%	State/Local Government Agency \Non-U.S
4.5%	Independent Expert Panel
4.4%	Academic Institution
3.0%	State/Local Government Agency \U.S
2.7%	Disease Specific Society
1.7%	Hospital/Medical Center
1.0%	Public For Profit Organization
0.7%	Private Nonprofit Research Organization
0.3%	Private For Profit Organization
0.3%	Managed Care Organization
0.2%	International Agency

Table 3. Application of Category Codes in the full NGC (1964 guidelines) and in the YGRC sample (425 guidelines). More than one category may be used to code each guideline.

NGC		YGRC
Percentage	Number	Percentage	Number	Category Codes
58.8%	1155	62.4%	265	Management
57.7%	1133	61.2%	260	Treatment
47.5%	932	43.5%	185	Evaluation
41.5%	815	34.4%	146	Diagnosis
31.2%	612	32.0%	136	Prevention
17.5%	344	21.2%	90	Risk Assessment
12.1%	237	17.2%	73	Assessment of Therapeutic Effectiveness
14.9%	292	15.5%	66	Screening
9.6%	189	11.3%	48	Counseling
2.1%	42	1.9%	8	Technology Assessment
2.1%	41	1.4%	6	Rehabilitation
0.1%	2	0.0%	0	Education

Table 4: Results: Documentation of Strength of Recommendations in the YGRC

Strength Present

N (%)

Strength Absent

N (%)

Strength Inaccurately

Indicated

N (%)

519 (40.7)

672 (52.7)

84 (6.5)

Figure 1. Method of development of the Yale Guideline Recommendation Corpus

Figure 2. Methods Used to Analyze Evidence

Figure 3. Methods used to Assess Quality and Strength of Evidence