Summary: Biosystems Data Analysis
- This + 400k other summaries
- A unique study and practice tool
- Never study anything twice again
- Get the grades you hope for
- 100% sure, 100% understanding
Read the summary and the most important questions on Biosystems Data Analysis
-
A: Data properties, preprocessing, p-value
This is a preview. There are 26 more flashcards available for chapter 04/01/2021
Show more cards here -
OMICS is not high-troughput data. Explain what it is then?
- High troughput = quantification of a single component in a large number of samples (in short time)
- so determine one concentration in many samples
- Multiplex technologies = quantification of a large number of (related) components in a single sample
- --> OMICS
-
Give an overview of the RNA-seq procedure.
- Stop all activity in the sample (quenching)
- Isolate mRNA
- Reverse transcription (RNA --> DNA)
- Optional amplification by PCR
- Library construction (attaching sequence tags/adaptors)
- Sequencing
- Stop all activity in the sample (quenching)
-
Which points should you consider in regard to quenching?
- RNA's have short half-lifes in cells
- RNAases have to be stopped
- Stress of handling can induce gene expression
- Breaking cells (bacteria) can be hard
- Obtaining sample can be time-consuming
- RNA's have short half-lifes in cells
-
Which points should you consider for RNA isolation?
- Most RNA is ribosomal RNA
- Eukaryotic mRNA can be enriched using poly-A hybridization
- Most RNA is ribosomal RNA
-
Name the two types of variation.
- Biological variation
- = variation of interest
- Variation between similar samples (individuals)
- Technical variation
- filter out
- Biological variation
-
Name sources of technical variation (eg RNAseq).
- Sample preparation (media, temperature...)
- Sample isolation (handling, speed of sequencing)
- Differences in mRNA quality
- cDNA synthesis
- Amount of cDNA added
- Sequence bias
- Random measurement error (you cannot get rid of it unless you repeat it many times and take the average)
- Sample preparation (media, temperature...)
-
Where does bias in RNA seq come from?
- Variation in amount of isolated mRNA
- Variation in quality of isolated mRNA
- Variation in quenching efficiency
- Variation in cDNA synthesis efficiency
- Variation in sequencing efficiency (number of sequences read)
Or, due to interesting biological variation in amount of mRNA! - Variation in amount of isolated mRNA
-
Give 3 subtle bias effects in RNA-seq.
- Fragment length (size selection)
- Position (degredation)
- Sequence bias (high GC --> lower counts)
- Fragment length (size selection)
-
When executing normalization, you need to assume hypotheses about the origin of observed variations in sequencing counts. Give the two hypotheses and subsequent procedures.
H1 : Approximately equal concentration ofmRNA in each sample- Implies variations in total counts per sample are due to technical reasons
- Solution: divide sequence count for each gene by the total sequence count of the sample
- Result:
RPM : Rate Per Million reads H2 : Approximately constant number of sequences perkilobase ofmRNA Variations in counts between genes are due to gene length Solution: divide sequence count for each gene by total sequence count and by length of gene in kilobases Result: RPKM: Rate Per Kilobase per Million Reads
-
How can we use RPM and RPKM?
RPM : Allows comparison of the same gene between samplesRPKM : Allows comparison of same gene between samples, and on top of that, between different genes- but underlying hypothesis =
debatable (subtle bias
- Higher grades + faster learning
- Never study anything twice
- 100% sure, 100% understanding