Ductal in situ and invasive breast carcinoma begins in the cells of the duct.
Estrogen (ER) and Progesterone (PR) Receptors, and Human Epidermal Growth Factor (HER2)
The expression of these three proteins has been used for \(30\) years to classify breast tumors into subtypes.
Tumors of the three subtypes have very different clinical and molecular characteristics.
Each subtype has different treatments. For example, Tamoxifen for ER+ tumors and Herceptin for HER2+ tumors.
-In clincial settings, HER2 status is usually measured by Flourescence In Situ Hybridization (FISH).
High-throughput profiling of the transcripts (gene expression profling or “transcriptomics”) has provided a more robust detailed perspective on BC hetergeniety.
Transcriptomics Excellent set of slides describing sequencing technologies and trends. Includes nice graphics about Illumina technology. [From D Beiting, UPenn]
Let’s take a brief look at some of these.
Sequencing by Synthesis I would recommend starting with this one. It is quite high level. [From Henrik’s World]
Sequencing by Synthesis II This video is more detailed, providing my background information on the underlying chemistry. [Eric Chow, UCSF]
A lot of data of different types. Just transcriptome and (some) clinico-pathological data for now.
But you can go back and get more anytime you would like from the GDC
library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2 ✓ purrr 0.3.4
## ✓ tibble 3.0.3 ✓ dplyr 1.0.1
## ✓ tidyr 1.1.1 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ───────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
# this is an R package that contains ggplot graphing tools
# amongst other things
load("~/data/GDC/TCGA_BRCA/small_brca.Rdata") # load the TCGA dataset for breast cancer
Find \({\tt small\_brca}\) object in the environment.
A tibble is a rectangular collection of variables (in the columns) and observations (in the rows).
Here I’ve just selected \(50\) well studied breast cancer genes (of \(>23,000\) gene total).
colnames(small_brca)
## [1] "id" "tss"
## [3] "participant" "barcode"
## [5] "bcr_patient_uuid" "form_completion_date"
## [7] "birth_days_to" "gender"
## [9] "menopause_status" "race"
## [11] "ethnicity" "tumor_status"
## [13] "vital_status" "death_days_to"
## [15] "histologic_diagnosis_other" "initial_pathologic_dx_year"
## [17] "age_at_diagnosis" "micromet_detection_by_ihc"
## [19] "lymph_nodes_examined_count" "ajcc_pathologic_tumor_stage"
## [21] "er_status_by_ihc" "er_status_ihc_Percent_Positive"
## [23] "pr_status_by_ihc" "pr_status_ihc_percent_positive"
## [25] "her2_fish_status" "her2_copy_number"
## [27] "histological_type" "metastatic_tumor_indicator"
## [29] "tumor" "ANLN"
## [31] "FOXC1" "CDH3"
## [33] "FGFR4" "UBE2T"
## [35] "NDC80" "PGR"
## [37] "BIRC5" "ORC6"
## [39] "ESR1" "PHGDH"
## [41] "PTTG1" "MELK"
## [43] "NAT1" "CXXC5"
## [45] "BCL2" "RRM2"
## [47] "GPR160" "EXO1"
## [49] "UBE2C" "TYMS"
## [51] "KRT5" "KRT14"
## [53] "MAPT" "CDC6"
## [55] "MMP11" "MYBL2"
## [57] "SFRP1" "CCNE1"
## [59] "BLVRA" "BAG1"
## [61] "MLPH" "CDC20"
## [63] "CENPF" "MIA"
## [65] "KRT17" "FOXA1"
## [67] "ACTR3B" "CCNB1"
## [69] "MDM2" "MYC"
## [71] "CEP55" "SLC39A6"
## [73] "ERBB2" "GRB7"
## [75] "KIF2C" "NUF2"
## [77] "EGFR" "MKI67"
## [79] "TMEM45B"
print(small_brca[1:3, 1:4])
## # A tibble: 3 x 4
## id tss participant barcode
## <chr> <chr> <chr> <chr>
## 1 TCGA-E9-A1NF-01A-11R-A14D-07 E9 A1NF TCGA-E9-A1NF
## 2 TCGA-D8-A27M-01A-11R-A16F-07 D8 A27M TCGA-D8-A27M
## 3 TCGA-BH-A0GZ-01A-11R-A056-07 BH A0GZ TCGA-BH-A0GZ
print(small_brca[1:3, 5:8])
## # A tibble: 3 x 4
## bcr_patient_uuid form_completion_date birth_days_to gender
## <chr> <chr> <chr> <chr>
## 1 a8b1f6e7-2bcf-460d-b1c6-1792a9801119 2011-6-23 -21981 FEMALE
## 2 ae65baeb-6b78-492a-8c63-bb7e93e83dc2 2011-7-17 -21910 FEMALE
## 3 27dfb9d4-3a2c-44bc-9acf-8f638d3f3004 2010-11-10 -22714 FEMALE
print(small_brca[1:3, 9:13])
## # A tibble: 3 x 5
## menopause_status race ethnicity tumor_status vital_status
## <chr> <chr> <chr> <chr> <chr>
## 1 Post WHITE NOT HISPANIC OR LATINO TUMOR FREE Alive
## 2 Post WHITE NOT HISPANIC OR LATINO TUMOR FREE Alive
## 3 Post WHITE [Not Available] TUMOR FREE Alive
Ok, I don’t like the black and the points are too big.
ggplot(data = small_brca) +
geom_point(mapping = aes(x = ESR1, y = ERBB2, color = tumor), size = 0.1)
Some of our samples are tumors and some are matched normal. Perhaps not surprisingly, the normals have low expression of both oncogenes.
Note that the aesthetic (\({\tt aes}\)) contains a third argument \({\tt color}\) that uses the \({\tt tumor}\) attribute/column. Careful, note that the color is inside the \({\tt aes}\)!
ggplot(data = small_brca) +
geom_point(mapping = aes(x = ESR1, y = ERBB2,
shape = tumor, color = ajcc_pathologic_tumor_stage), size = 1)
ggplot(data = small_brca) +
geom_bar(mapping = aes(x=lymph_nodes_examined_count, fill = ajcc_pathologic_tumor_stage))
© M Hallett, 2022 Western University