The chance that something will happen within a given amount of time. In medical studies, it’s usually the chance that someone will get a disease or die of it. For example, the statement “According to the National Cancer Institute, 1 in 68 women will get breast cancer between the ages of 40 and 50” is a statement of absolute risk.
Any incident that occurs following a drug, vaccine, surgery, procedure or other medical intervention. If the adverse effect was actually caused by the intervention, then it’s a side effect. But even though all side effects are types of adverse events, not all adverse events are side effects. For example, in a randomized controlled trial, adverse events that occur at roughly the same rate in the intervention group and the control group probably aren’t caused by the intervention.
The background rate of a particular condition refers to how often it occurs normally in a particular population. Researchers use the background rate of certain conditions to look for adverse events linked to medical interventions, such as pharmaceuticals or surgeries, and determine whether they occur at a higher rate among patients receiving the intervention. For example, if researchers want to determine how much higher the risk for a blood clot is following a particular surgery, they have to first find out how many people in the general population experience a blood clot even if no one in the population has had that surgery. If the background rate is 1 blood clot per 1,000 people, and those who undergo a specific surgery experience blood clots at a rate of 5 per 1,000 people, then the researchers know there is a good chance the surgery is what is contributing to the blood clot. (The next step would be to see if those individual had other underlying conditions that might predispose them to blood clots.)
Also called basic research, these are studies that focus on the fundamental building blocks of life, like cells, organelles, amino acids or genes. Basic studies are critically important to new discoveries – these are the kinds of studies that often uncover biological targets for new drugs, for example – but they don't have an immediate practical application. For this reason, reporters should take great care communicating the preliminary nature of basic scientific findings to the public.
Biases are systematic errors in the design or reporting of medical studies that produce a false pattern of differences between observed and actual results. Biases skew the actual effect of a treatment or medication.
Important sources of bias that reporters should look for include: Selection bias, or key baseline differences between groups that are to be compared; Attrition biases, or between-group differences who leaves a study; Performance biases, or differences in the care that is provided between two groups, or differences in exposures; and Reporting biases, or differences between reported and unreported findings.
When two things are associated, such as a condition and an outcome, researchers often seek to find out whether one causes the other. If the relationship is bidirectional, then they contribute to one another, much as in a feedback loop.
For example, in investigating the relationship between smoking and lung cancer, researchers eventually determined that smoking causes lung cancer; the causation runs in one direction, or unidirectional. But what if behavior A contributes to outcome B and outcome B also contributes to behavior A? That would be a bidirectional relationship.
Consider the example of spanking. Research has shown that being physical punishment is linked to increased aggression in children and teens. But the question remained for many years whether it was physical punishment that caused aggression or whether more aggressive children simply got spanked more often because they acted out more. Or, is it a bit of column A and a bit of column B? In the case of spanking, researchers measured baseline aggression in children and then compared children who were and were not physically punished but started out with the same level of aggression. Longitudinal studies eventually revealed that children who were spanked became more aggressive, even compared to non-spanked children who started out with a similar level of aggression. Still, however, it is likely that the relationship is partly bidirectional over time: the more aggressive a child becomes, the more often the child may be physically punished.
A qualitative, descriptive study that focuses on an individual patient (a case series includes multiple individuals) and a particular condition, procedure, association or other phenomenon that is unusual and interesting enough to be written up on its own. Case studies in and of themselves cannot show causation, establish a trend, generalize to other individuals, inform incidence or prevalence or otherwise “prove” anything. They are used to generate hypotheses, raise awareness of a potential issue, provide instruction on a procedure, seek a mechanism for a suspected cause-and-effect relationship or related goals.
Case control study
This type of retrospective study design identifies a group of individuals who have already experienced a particular outcome or who already have a particular condition; they are called the “cases.” Then, researchers find a group of other individuals who are substantially similar to the cases in important ways — the “controls” — usually matched to the cases on the basis of age, sex, and similar demographic factors, such as geography, race/ethnicity, socioeconomic status, doctor or clinic, etc. How controls are matched to the cases will depend on the study, and many case control studies will match multiple controls to a single case (three controls to one case or 10 to one, etc.). Researchers then compare certain pre-identified factors or characteristics between the cases and the controls with the hope of identifying risk factors in the case group for the shared condition they have. For example, a case control study might bring together a group of individuals who all have high blood pressure and then match controls without high blood pressure to these cases. Then the researchers might look at dietary patterns or physical activity in the cases and the controls to see if any patterns suggest that certain aspects of diet or physical activity may contribute to high blood pressure.
When researchers measure a combination of possible clinical events in a clinical trial, they have created a composite endpoint. Composite endpoints increase the statistical power of studies, which allows scientists to run smaller, quicker and usually less expensive trials.
Composite endpoints can be useful when they measure events that are generally of equal severity and importance to patients, i.e., heart attacks and strokes. But composite endpoints can be misleading if they include surrogate endpoints, especially if those surrogates are more common and less significant than the other outcomes, i.e., a composite endpoint that includes heart attacks, strokes, and cholesterol counts. In such muddled composites, the less significant events often drive the supposed effect of the intervention that’s being tested, and they can make a treatment look more effective than it actually is.
Confidence intervals are one way that researchers report statistical significance in a study. The other is the p-value.
Unlike p-values, confidence intervals report the range of possible treatment effects, rather than just the average effect. Because of this, they can be useful sources of information for health reporters. Wide confidence intervals are generally distrusted in studies, since they indicate that the treatment effect is not very precise or reproducible. Narrow confidence intervals, on the other hand, are usually a sign that the study is well done and that the effect of a drug or treatment is reliably reproduced from patient to patient.
Confidence intervals are not statistically significant if they include the value of no effect (normally, the effect of no treatment see in the control group.) Typically, the control group, or reference group is given the value of 1, so confidence intervals that cross the number 1 are generally not statistically significant, though there are exceptions.
Conflict of interest
A set of circumstances that creates a risk, real or perceived, that professional judgment or actions concerning a primary interest will be unduly influenced by a secondary interest, such as a financial one; a conflict of interest exists whether or not a particular individual or institution is actually influenced by the secondary interest. (adapted from Institute of Medicine definition)
In observational studies, confounding variables are factors that confuse or obscure the association between a primary exposure of interest and an outcome.
For example, scientists studying the relationship between birth order and Down’s syndrome found that later born children had much higher risks of Down’s syndrome than first-born children. When they delved deeper into the association, however, they found much of that risk was explained by maternal age. Mothers over age 40 were far more likely to have babies born with Down’s syndrome than younger mothers. At the same time mothers having a third, fourth, or fifth child are also more likely to be older. Therefore, the association between birth order and Down’s syndrome was confounded by maternal age.
Confounding is very common, and it is not always easy to tease out or control for in observational studies. It’s the main reason that randomized, controlled trials are considered to a higher level of evidence than observational studies.
A covariate is a variable particular to each participant in a study (or each subject being studied, if it’s not an individual but rather, for example, a clinic) which could potentially influence the outcome. The term covariate technically includes the independent variable(s) the researcher is specifically investigating, but most often, in practice, the term refers to potential confounding variables in a study, such as age, sex, income, education, underlying conditions or other characteristics particular to the research area. Covariates and confounders can overlap but are not the same thing. All confounders definitely affect the outcomes whereas not all covariates do. In some studies, adjusting for covariates does not change the results, showing that those covariates were (usually) not influencing outcomes. Also, covariates are explicitly selected, assessed, recorded and usually calculated in a study. Confounders, on the other hand, may be covariates that were considered in the study, or they may be other variables that were not considered. (Read more on how the term can become confusing here.)
A kind of observational study that lacks temporality, or a relationship with time. Cross-sectional studies gather data about their participants at one point in time. These studies can show relationships, or associations, between different factors, but they can't show which happened first.
A press embargo means that a journal article, research study content, announcement or other news item cannot be publicized in any way until a specified date and time, typically dictated by the source of the information. For medical and other scientific studies, the release date usually is the date of publication. An embargo provides journalists extra time to do sufficient reporting and write the article before the information becomes public knowledge. The value of advance access is that it potentially reduces mistakes as journalist rush to be the first to report the news. Organizations usually require journalists to agree to the embargo, which allows them (except in highly unusual situations) to seek outside comment but not to distribute the information widely or to anyone with a relevant investment stake (including the journalist, especially in cases insider trading may be an issue). Breaking an embargo can get a journalist barred from receiving future embargoed content from that source. The statement “For Immediate Release” at the top of a press release, email, study or other news item means the information is NOT embargoed (except in cases where the source makes a mistake).
In biology, an endemic species is one that is native to specific region, such as the kangaroo being endemic to Australia. The cane toad, on the other hand, was a species introduced to Australia and hence was not endemic (though it is now). In epidemiology, endemic refers to the circulation of a disease within a certain population or geographic area that continues without outside interference or introduction. For example, malaria is endemic to many parts of Africa. Although malaria was once present in the U.S., it would no longer be considered such. Once a disease has been completely eliminated from a geographic region, such as a continent, it is no longer endemic to that region.
The cause of a disease or condition; most often etiology refers specifically to the biological mechanisms underpinning a particular condition
These are graphic representations of data from a meta-analysis, in which the researchers need to show the results of multiple different studies in a way that allows comparison of each individual study to the others. Hence it allows you to see “the forest” as well as each tree.
Each line is one study (usually with the authors and date included) from the meta-analysis that also includes basic information about the study, such as the population size, the hazard ratio or odds ratio, a mean and/or standard deviation related to the results, etc. What’s included depends on what the authors are focused on or what they’re comparing. Sometimes a forest plot includes a column indicating the percentage weight it contributed to the overall findings of the meta-analysis, perhaps based on number of participants or some other characteristic. Then further to the right, each study is compared based on the common outcome measure used for the results (odds ratio, standard deviation, etc.). Squares indicate the result from each study with lines extending in either direction that represent the confidence interval or range of the results. A diamond is used on the forest plot to indicate where the overall findings of the meta-analysis fall (combining all of them).
Generalizability refers to the extent to which findings in a particular study can be applied or extended to populations beyond the population studied. The differences in populations could be age, sex, gender, health status, race, ethnicity, geography, marital status or any number of other possible ways to categorize populations. For example, extremely few studies on children could be generalized to adults and vice versa. This age restriction is built into the FDA drug approval process: a drug cannot be approved for age groups that were not studied in the clinical trials because there is no evidence that the safety and effectiveness could be generalized to groups outside the age ranges initially studied. Differences can be far more subtle, however. For example, the findings of a study of transitioned transgender children’s mental health in the greater Seattle area may not be able to be generalized to the mental health of their otherwise identical counterparts in rural Mississippi because the social acceptance and cultural environment of those different geographical areas could so greatly influence the mental health of this population. Similarly, findings in a group of patients with a specific condition, such as high blood pressure, cannot be presumed to apply to patients without that condition unless a different study provides evidence for it. Any characteristic that could differ across populations could be a barrier to generalizability of findings in both clinical trials and in observational studies.
In medical research, gray literature refers to studies that have been conducted but have not been published in a peer reviewed medical journal. They may be referenced as conference abstracts or in technical or working papers, however, giving them a faint, but not fully transparent presence in the public domain. gray literature is typically not easy to search, and it lacks the full accounting of materials, methods, and results, required for publication in a peer-reviewed journal.
Hazard ratios, which are often abbreviated HR, are one way researchers report the relative effect of a drug, treatment, or exposure. A hazard ratio of 1 indicates no effect of an exposure or treatment. Hazard ratios over 1 indicate increased risks. Hazard ratios under 1 indicate decreased risk. Hazard ratios are similar to, but not exactly the same as Relative Risks, though they are often reported the same way.
Healthy user effect
This kind of bias may be at work in studies that find an unexpected benefit associated with treatment. It refers to the fact that people who are health conscious--they're more likely to get regular check-ups and more likely to comply with their doctors orders, to take their prescriptions as written, etc.--usually fare better, health-wise than those who do not or cannot. For example, years of observational studies concluded that seniors who got flu shots had half the risk of dying from any cause during the subsequent flu seasons compared to those who didn't get flu vaccines. Thus, researchers reasoned, flu shots appeared to be powerful way to slash an elderly person's death risk. But studies that dug a little deeper, examining individual medical records for other signs of health and frailty, found that seniors who got flu shots were simply healthier to begin with than those who didn't get their annual vaccines. It was healthy users who were surviving the winter months. Those studies found flu shots had little effect on overall survival.
In a randomized, controlled trial, the intent-to-treat (ITT) population represents all the study subjects who were randomized to the different treatment groups. It's the most inclusive group in the study because it ignores people who drop out of the study or who don't comply with their treatments according to study guidelines. Analysis of the ITT population tends to make two treatments look more similar, while analysis of the per-protocol population tends to emphasize treatment differences. When results are similar for both the intent-to-treat and per-protocol populations, it increases confidence in the study results.
These graphs plot the proportion of individuals surviving without an event over the study period. Time is typically depicted on the horizontal axis, while the proportion of study participants surviving is on the vertical axis. A curve is plotted for each group in the study. Separation between the curves usually indicates differences in treatment effectiveness.
One advantage of Kaplan-Meier curves is that they are able to account for all patients in a study, even those who dropped out or were lost to follow up. They help doctors get an idea of median survival times, and they help researchers compare the effects of different treatments between groups in a study.
A kind of observational study that follows study participants over time. These studies take repeated measurements of the variables of interest and may last years or even decades. Because they measure changes over time, they can establish a sequence of events. But they can't definitively prove cause-and-effect.
A meta-analysis is a statistical technique for combining the results from independent studies that have all looked at the same question. It’s often used to assess the clinical effectiveness of treatments. The value of a meta-analysis depends on the quality of the studies included in the review.
Minimally clinically important difference (MCID)
Also called “minimally important difference” or in a slightly different form, “minimally clinically important improvement.” This term refers to the smallest amount of change or effect from a treatment that matters to a patient or would result in a change in a patient’s care. For example, if a doctor is contemplating changing a patient’s medication from drug A to drug B, the minimally clinically important difference might be a 10 percent reduction in risk or a 10 percent improvement in pain. If drug B only offers, say, an 8 percent improvement, that falls below the threshold necessary for the doctor to make the change, which may especially be relevant if drug B’s side effects are more troublesome. MCID (or MID or MCII) is especially relevant for considering the clinical significance of a research finding.
The opposite of the placebo effect, a nocebo effect describes side effects or increased symptoms, rather than symptom improvement, that occur in people receiving a placebo. The nocebo effect is even less understood than the placebo effect but may relate to anxiety. See this link for more information.
Number needed to harm
This number is similar to number needed to treat (NNT) in the opposite direction: It is the number of people who need to receive an intervention (a medication, a surgery, a treatment, etc.) before one of them is harmed. Whereas a good NNT is a very low number – such as only two people taking a drug for one to benefit – a good NNH is a very high number, such as giving a medication to 1,000 people for one to experience an adverse event. The smaller the NNH, the more common adverse events are, and the riskier the intervention is.
NNH can also refer to removing a therapy. For example, anti-epileptic drugs for epilepsy can have negative long-term effects. In a systematic review, researchers looked for the appropriate timing for discontinuing medication without having patients suffer a relapse. The NNH for discontinuing medication was 8. For every 8 individuals who discontinued anti-epileptics, one would relapse and experience a seizure.
Number needed to treat
The number needed to treat, or NNT, is a way to sum up treatment effect, and unlike some statistical concepts, it’s wonderfully straightforward: It’s simply the number of patients a doctor needs to treat to help just one person.
In observational studies, researchers look for differences between exposed and unexposed groups, after people have already made their own lifestyle or treatment choices. In observational studies, researchers have no control over who’s in the exposed or unexposed groups at the start of the study. As a result, there are often fundamental differences between the two groups that can cloud the nature of the relationships under study. These differences, called confounders, can sometimes be identified and controlled with adjustments to gathered data. But sometimes important confounders exist are never identified. This is why observational studies can’t prove cause-and-effect. They can only show associations.
In an open label study, both the study participants/patients and the researchers/providers know what drug or treatment the participants are receiving. It’s the opposite of a blinded study, where one or both don’t know whether they are receiving the treatment being tested or a placebo. Open label studies can introduce bias into a study, but they also are sometimes used when it’s too difficult or expensive to disguise a drug, such as one that would be provided intravenously. Open label studies are also sometimes done so that the study can go on longer (patients will be more willing to take a drug for a condition they have if they know it’s the drug and not a placebo), thereby providing more information about effectiveness and safety.
The per-protocol population is the group of subjects in a randomized-controlled trial that most closely stuck to their treatment regimens. As far as investigators can tell, this is the group that took their medications as directed and came to study meetings for follow-up. The per-protocol group is a smaller subset of the intent-to-treat population. In medical studies, analysis of the intent-to-treat population tends to make treatments look more similar, while analysis of the per-protocol population tends to emphasize treatment differences. When results are similar for both the intent-to-treat and per-protocol populations, it increases confidence in the study results.
A placebo is a "fake" medicine or treatment intended to substitute for the real one, most commonly used for the control group in randomized trials. Placebos are not chemically or mechanically bioactive – they're not supposed to have any actual physical effect on the body – but they can work on the mind. Any parent who has kissed a "boo boo" or reluctantly "wasted" a bandage on an unbroken patch of skin that hurts has seen the placebo effect in action. “Sham” procedures (“sham surgery” or “sham acupuncture”) are the placebo version of non-medication interventions. The opposite is the nocebo effect. See this link for more information.
A prospective study follows people forward in time. The advantage of prospective research is that researchers can pose a question and then design a study that will (hopefully) gather the best information to answer that question.
Publication bias refers to differences between studies that get published in medical journals and those that do not. A 1991 study published in the Lancet compared published to unpublished research. Investigators found those studies that were positive, meaning they generated statistically significant results, were more than twice as likely to be published as those that did not find significant differences between the treatment and control groups.
The result of publication bias is that the body of studies available in medical literature may portray a drug or treatment as being more effective than it actually is. Systematic reviews of drug or treatment effects often try to control for publication bias. Publication bias is part of a larger group of factors that may skew study reviews called reporting biases.
Practice guidelines are developed by a panel of experts, frequently convened as a group within a professional medical society, that outlines the most up-to-date best practices based on the current evidence. Government agencies and public or private organizations may also produce clinical guidelines. These guidelines, developed after an extensive review of current literature on specific clinical areas or conditions, are designed to help both healthcare providers and patients in making clinical decisions. As noted at The George Washington University’s Himmelfarb Health Sciences Library, “Good guidelines clearly define the topic; appraise and summarize the best evidence regarding prevention, diagnosis, prognosis, therapy, harm, and cost-effectiveness; and identify the decision points where this information should be integrated with clinical experience and patient wishes to determine practice.”
Randomized controlled trial
A randomized controlled trial, or RCT, is a specific kind of scientific experiment in which researchers screen and recruit people, then randomly assign them to a treatment or control group at the start of the study. In a double-blind trial, neither the researchers nor the study participants know which group they are in. This helps to reduce performance and treatment biases. A placebo-controlled trial is one in which the control group takes a placebo, or a look-alike treatment that has no effect. High quality RCTs are considered to be the highest levels of scientific evidence, but they are also very expensive to stage and cannot always be done in an ethical way. For example, it’s unlikely that anyone would ever design a randomized controlled trial to test the effects of smoking on lung cancer, since assigning people to smoke would certainly harm their health.
This type of bias refers to a research participant’s difficulty in accurately remembering information they are asked for in a retrospective study. For example, in a retrospective study on alcohol exposure during pregnancy, women may be asked about their alcohol intake during pregnancy much later, such as months after they have given birth. By then, they may not be able to recall their alcohol consumption as accurately as if they had been asked in a prospective study going on during their pregnancy. Another way recall bias might interfere in a case control study is that cases — those with a condition — may be more likely to remember their exposure to a particular substance or experience than those in the control group who do not have the condition being studied. Someone with asthma, for example, may more easily recall the last time they sat in stopped highway traffic than someone without asthma and therefore without as much reason to remember a circumstance that might worsen a condition they don’t have.
Relative risk, usually abbreviated RR, is a comparison of risk levels between two groups in a study, usually the treatment and the control group. Relative risks are similar to, but not exactly the same as hazard ratios, though they are often reported the same way. A relative risk of 1 indicates no change in risk between the exposed group and the control group. Relative risks over 1 indicate that a treatment or exposure increases risk. Relative risks under one indicate decreased risks. For example, a relative risk of 2 indicates a doubling of average risk. While a relative risk of .75 means that that the average risk dropped by 25 percent.
Also called reverse causation, this becomes a possibility when the "effect" of something could actually be its cause. For example, some studies have observed that diet soda drinkers are more likely to be obese than people who don't drink diet soda. That's led to speculation that artificial sweeteners in diet drinks may somehow cause obesity. Critics of that theory, however, have pointed out that people who are gaining weight may switch to diet drinks as a way to cut calories. Thus, they argue, the tie to obesity may be an example of reverse causality. Jodi Beggs has some funny examples of reverse causality on her blog, “Economists Do It with Models.”
Retrospective studies are observational studies that look back in time. In retrospective studies, researchers start with a population that's already experienced a given outcome, such as cancer, and try to go back in time to find exposures that make have contributed. In many cases, data used in retrospective studies was generated before the study was conceived, so it isn't always a perfect fit. And researchers may not have all the information they need to definitively answer a question. In other cases, like outbreak investigations, researchers rely on retrospective data gathering to help them determine the source of an illness. They start with sick people and work back in time to find the source of the illness.
Sometimes called a run-in phase, a run-in period describes the period of time before the start of a clinical trial, before the participant receives or participants in any intervention. Data from run-in periods are used for two reasons: to screen out participants who are not eligible based on conditions or exclusions or who show that they are unwilling or unable to be compliant with the intervention, or as a “washout” period to allow the effects of a previous intervention to fade before starting a new one.
A systematic review is a type of study that comprehensively review all other relevant studies on a specific research question or clinical topic. On the hierarchy of evidence — the weakest evidence to the strongest evidence in terms of methodology — systematic reviews are second highest, just under meta-analyses. During a systematic review, researchers review and combine all the information from published and unpublished studies they have identified that meet pre-defined criteria. They involve five steps: framing a question; identifying the relevant research; assessing the quality of the studies; summarizing the evidence; and interpreting the findings.
In translational or applied scientific studies, researchers use a body of scientific knowledge to solve a practical problem. For example, basic research identified a gene that causes a common form of the blood clotting disorder hemophilia. In a subsequent translational study, researchers used a virus to replace the broken gene in human cells, successfully treating six patients with the disease.
A washout period can describe two scenarios: a) the run-in period before a study begins during which researchers are waiting for a previous drug or intervention’s effects to wear off before they start a new one (to avoid confounding between the two interventions), or b) the period between two different interventions when researchers are waiting for the participants’ bodies to normalize after the first intervention before beginning the next one.
An example of the latter might be a study in which people participate in a strictly defined sleep or exercise or diet routine for a couple of weeks and then, after a washout period of perhaps two weeks, participates in a different strictly defined sleep or exercise or diet routine to compare the effects of each one in each person (a type of self-controlled case series design or cross-over study).