Activities from:


A Workshop at
SATURDAY, NOV 11, 2000

Back to the Conference Announcement

Seven Scenarios: A Context-Setting Activity for Studying Bioinformatics & Biotechnology
Is He Guilty?: An Introduction to Working with Sequence Data and Analysis
Exploring HIV Evolution: An Opportunity to Do Your Own Research

Background information on HIV biology

Seven Scenarios: A Context-Setting Activity for Studying Bioinformatics & Biotechnology

Parents, Police, Patents, Privacy, Patients, Profit, and Peanuts

Category Cards
expectant parents and a gene associated with a disabling condition.
  1. Scientists have identified a gene and have developed a test for the gene.
  2. This gene is a risk factor for a disabling but not fatal condition.
  3. A couple is expecting a baby.
  4. Both prospective parents "carry" this gene.
  5. The parents are concerned about the costs of raising a child with a disability.
culpability of someone accused of transmitting a virus.
  1. A person is accused of sexually transmitting a virus.
  2. Police use blood tests to try to  determine if that person was the source of the virus.
  3. The virus causes disease appearing years after the initial transmission.
  4. The defendant's lawyer argued that because the viruses in the accused and the accuser were so different from each other, her client should not be found guilty.
gene and drug companies seeking gene patents.
  1. A patent grants exclusive ownership of intellectual property so that the patent owner can profit from its use.
  2. Many biotechnology firms are pursuing patents for gene sequences.
  3. The companies hope that the gene sequences can be used to develop specific biological products.
  4. There is currently a rush to apply for patents on any possibly useful sequences.
a job candidate’s pre-employment physical.
  1. A candidate for a job is required to take a pre-employment physical.
  2. Genetic analysis identifies a form of a gene that has been linked to high blood pressure.
  3. The relationship between this gene form and high blood pressure is not well understood. Some people with the gene have normal blood pressure, and many without the gene have high blood pressure. 
  4. The candidate is hired, but is told he will have to pay higher premiums for medical insurance.
a physician paying for a genetic study of possible drugs to treat her patient.
  1. A physician is treating a patient who has an aggressively lethal cancer.
  2. She pays a biotechnology company $37,000 to find potentially effective drugs.
  3. The company identifies three drugs that are then used to treat the patient.
  4. The patient's cancer goes into remission.
for-profit and not-for-profit genomic enterprises.
  1. The human Genome Project is a consortium of academic research groups trying to determine the sequence of the human genome.
  2. HGP (Human Genome Project) is a not-for-profit project that receives a great deal of public funding.
  3. A for-profit company, Celera uses the publicly available HGP data to check its work, fill in the gaps, and stay a step ahead.
  4. Celera maintains a private database, available to corporations by subscription of $5 to $15 million year.
genetically modified organisms finding their way into the human diet.
  1. A fast food restaurant recently had to dump some food because it contained an unapproved food.
  2. A strain of peanut has been engineered to resist a fungus known to wipe out a whole season's crop.
  3. New foods do not need FDA approval if they meet three conditions: 1. the nutritional value is not lowered 2. the food is already present in the human diet 3. the food is not an allergen.
  4. Many people are allergic to peanuts.

How it worked:

The scenarios are written up on index cards of different colors, with each scenario having a single color.
The cards are numbered in the lower right side - there is a sequence.
On one side of each card there is one sentence (or maybe two).
On the other side is a one-word "identifier" of the scenario.
So, for example, there are 4 cards labeled Peanuts on one side each would have a different statement on the other side.

These cards are shuffled - Scenario 1 Card 1, Scenario 2 Card 1, Scenario 3 Card 1, etc., followed by Scenario 1 Card 2, Scenario 2 Card 2, Scenario 3 Card 3, etc., followed by Scenario 1 Card 3, Scenario 2 Card 3, Scenario 3 Card 3, etc.  The cards are then handed/dealt out around the room.

The shuffling is to make sure that people who are sitting together will not be sitting together when the scenario groups are assembled.

The person having card 1 of Scenario 1 is asked to identify herself and her scenario name (as well as the table where she wants her group to meet).  The same is asked in order of each holder of each card 1, through Scenario 7.

Once assembled, the groups are asked to have each person read their card to their group, and then to consider three questions within the group (these three questions are written on a separate card that is given to each group):

Three questions to be considered by each group
1. Is there any information about the scenario that you wish you had or that you felt was missing?  In other words, is there enough information to consider?
2.  What issues (philosophical, historical, political, scientific, ethical) arise in discussion of this scenario?
3.  What kind of research or investigation would you consider doing based on this scenario?

The groups are asked to discuss their scenarios for a while, after which they are to read the sentences of their scenario and then share highlights of the discussion.

Everyone talks within the scenario groups, first by reading their sentence to the group.  This sets the tone for discussion, first because each person's voice is heard (authoritatively, i.e., reading), and because each sentence is necessary to the scenario.  The small groups tend to be less intimidating, and each group has a chance to coalesce, or at least to interact meaningfully as a group.

Then, scenario by scenario, the groups read their sentences (again with each person reading his or her own sentence), and then one member of the scenario reports on the discussion, usually referring to the 3 questions.

By the time the activity is over, the context for studying bioinformatics and for doing bioinformatics activities (both paper and pencil as well as using Biology Student Workbench) should be well set.

November 11, 2000 workshop:


Is he Guilty?: An Introduction to Working with Sequence Data and Analysis

This is a short paper and pencil exercise to help you get warmed up to working with sequence data. The exercise is built around a famous investigation in the early 1990's where a dentist in Florida was accused of spreading HIV to some of his patients during invasive dental procedures. In addition to using virus sequence data to determine if the dentist was the source of the patients HIV you will get experience with:

  • the types of information that are associated with sequence data submitted to public research databases
  • the differences between working with nucleic acid sequences (DNA) and amino acid sequences (protein)
  • the ways we can read similarities and differences between sequences
  • how a multiple sequence alignment summarizes the comparisons of sequences
  • how a phylogenetic tree graphically represents the differences between sequences and can be used to develop hypotheses about their evolutionary relationships
  • how the evolutionary relationships between sequences can be used as forensic evidence
  • Background on case
    After doing some epidemiological research into the source of the HIV for an AIDS patient with no known risk factors one possible source of infection was identified. This patient had undergone an invasive procedure performed by a dentist with AIDS. Further research found that six other patients of this dentist were HIV-infected. A molecular analysis was done by the Standford University School of Medicine and Center of Disease Control to determine whether the patients of the Florida dentist contracted the virus from him. By comparing the genetic sequences of a virus gene from blood samples of the dentist, his patients, and other HIV+ individuals in the community who did not have contact with the dentist, scientists worked to determine if there was a relationship between the dentist's and patient's viruses. We will use this scenario to learn about comparing sequences and inferring evolutionary relationships based on their similarities and differences.

    -Popular literature

    Gentile, B. (1991). Doctors with AIDS. Newsweek. 48-56.

    - An examination of the role of doctor/patient relationship and whether a person's HIV status should be revealed. Stories of health care professionals that continued to work even after they knew they were HIV positive, including a look at the Florida dentist that infected his patients with HIV.

    -Scientific Literature

    Ou, C.Y.; Ciesislski, C.A.; Myers, G. et al. (1992). Molecular Epidemiology of HIV Transmission in a Dental Practice. Science. 256:1165-1171.

    - A molecular analysis done by the Standford University School of Medicine and Center of Disease Control to determine whether six patients of the Florida dentist that were found to be HIV positive contracted the virus from him. Portions of the HIV proviral envelope gene from each of the seven patients, the dentist, and thirty-five HIV infected people within the geographic area were amplified by polymerase chain reaction and sequenced. Accession numbers are given for the viruses used in the investigation so similar findings can be found by the class

    Molecular epidemiology of HIV transmission in a dental practice.
    Science. 1992 May 22;256(5060):1165-71.
    PMID: 1589796; UI: 92271245
    234 sequences

    More Articles on the Florida Dentist

    Taking a look at sequence data

    The sequence data we are using comes from a public database called GenBank. Follow these links to take a look at a representative sequence record. [stored as a local file] [live from the Internet]

    You can also look at the abstract for the paper that these sequences were published as part of:

    Ou, C.Y.; Ciesislski, C.A.; Myers, G. et al. (1992). Molecular Epidemiology of HIV: Transmission in a Dental Practice. Science. 256:1165-1171. [abstract]

    For this activity we have chosen 6 sequences to help you start exploring how genetic information can be used to determine if in fact the Dentist was the source of virus for the patients who have become HIV+. There are 4 sequences from patients, a dentist sequence, a sequence from someone who is HIV+ and lives in the area but has not had contact with the Dentist (local control), and a sequence from a HIV+ individual who lives in a different part of the world (outgroup).


    Interpreting a Multiple Sequence Alignment

    While it is possible to manually compare raw sequence data it quickly gets unwieldy when you are working with long sequences or lots of different sequences. Luckily, computers are very efficient at following instructions and performing mathematical operations. In this section you will work to interpret the output from a program that has performed a multiple sequence alignment on the 6 sequences you have been working with. The program "aligns" sequences by finding the best ways to make their different positions line up with one another and then color codes the positions to characterize the types of differences there are. There is also information available about the % difference between pairs of sequences.

    Multiple sequence alignment for the 6 HIV sequences - click to enlarge the image


    Tree Reading

    Part of determining if the dentist is the source of the patients' HIV is seeing how the sequences group togther based on their similarity. The assumption is that the sequences that are more similar are more closely related to one another - that is, they share a more recent common ancestor than the sequences from another group. It is possible to figure out the grouping patterns from a multiple sequence alignment but we can also turn that over to the computer and allow it to generate a tree showing the relationships between the sequences.

    Tree of the relationships between the sequences - click to enlarge the image



    Exploring HIV Evolution: An Opportunity to Do Your Own Research

    In this activity you will have the chance to develop your own questions and use the Biology Workbench for Students to answer them. The problem space is built around a rich set of HIV sequence data which is described below.

    The Markham et al. HIV-1 env Sequence Dataset

    Richard Markham and his colleagues (1998), published some research on the pattern of HIV evolution and the rate of CD4 T-cell decline in the Proceedings of the National Academy of Sciences. In addition to the journal article they submitted 666 nucleotide sequences to the GenBank database. They studied a 285 base pair region of the env gene. The gene product, membrane protein gp120, binds to the CD4 receptor site on T-lymphocytes and is involved with the entry of the virus into those cells. Markham et al. followed the evolution of this viral gene sequence in 15 subjects by collecting blood samples at six month intervals for up to four years. For each visit all the forms of the gene (clones) were sequenced and CD4 T-cell counts were made. This data set provides a rich resource for looking closely at the patterns of change in HIV over time.

    Summary of the data set - Data summary table

    Subjects: 15
    Number of visits: 3-9
    Number of clones per visit: 2-18
    Total number of sequences available: 666
    CD4 cell counts for each visit


    Markham RB, Wang WC, Weisstein AE, Wang Z, Munoz A, Templeton A, Margolick J, Vlahov D, Quinn T, Farzadegan H, Yu XF (1998). Patterns of HIV-1 evolution in individuals with differing rates of CD4 T cell decline. Proc. Natl. Acad. Sci. 95(21):12568-73.
    Pub Med ID: 98445411

    PNAS online: Vol. 95, Issue 21, 12568-12573, October 13, 1998 <>

    The Research Scenarios

    HIV evolution scenario #1

    Your research group is working with molecular biologists and physicians to try to develop a drug therapy that is more effective at stopping infections by HIV viruses. The sequences we are working with code for a protein that sits on the outer cell surface and binds with a molecule on the T-cell surface allowing the virus to enter the cell and ultimately destroy it (and with it our immune capabilities). Developing drugs that block the HIV binding protein have had limited short-term success. The major downfall is that the HIV genetic information changes rapidly (mutates) and the change in sequence changes how well the drug therapy will interfere with the HIV's ability to attach to the T-cell. What you need to do is look for patterns in the changes that occur in these HIV sequence over time. Are there certain positions that change more frequently than others? Do they change in predictable ways (have the same change)? You will not be able to study all the available sequences in the time we have available - how will you decide which patients data to work with?

    HIV evolution scenario #2

    Your research group is working with epidemiologists to try to understand the ways that HIV is transmitted through a population. Understanding the patterns of movement of the virus from one individual to another is an important step in designing community interventions, educational programs, and other public health approaches to stemming the spread of the HIV/AIDS epidemic. With the advent of inexpensive molecular biology tools, public health officials now have a new source of information for studying the transmission of a disease. Before they can make sense of the spread of the disease, they need to know something about how rapidly the virus is changing within an individual. They have turned to your research group for help understanding these patterns of change. Is the change occurring at a steady rate within an individual? Is the rate of change consistent between individuals?

    Getting Started with the Biology Workbench for Students

    The Biology Workbench for Students

    If you search for "Markham and Wang" you will see the 666 sequences from this study and you can choose the ones that you would like to work with.

    Background Information on HIV Biology

    Handout on levels of information [genome/gene/sequence/structure]

    Information to accompany figure: The HIV genome is about 9,200,000 RNA bases long (it is a retrovirus). It has 10 genes (transcribed units) that code for 17 different protein products. We will look at part of the env gene (envelope) which codes for 2 proteins that make up the outside of the protein coat or envelope. One of those proteins sits on the outer surface (gp120) of the virus and binds to immune cells (CD4 receptors on T-lymphocytes) and avoids antibodies. The other envelope gene sticks through the viral membrane (gp41) and holds the surface protein. But we are not looking at all of the surface protein sequence (it is about 15K bases long), just one of the variable regions (V3) which is thought to be involved in making contact with the immune cells that the virus attacks.

    HIV biology background []
    Cells Alive HIV tutorial [] TOP