Download: COSMIC
There are three COSMIC mutation datasets for coding mutations:
COSMIC Complete Mutation Data (Targeted Screens)
A tab separated table of the complete curated COSMIC dataset (targeted screens) from the current release. It includes all coding point mutations, and the negative data set.
COSMIC Mutation Data (Genome Screens)
A tab separated table of coding point mutations from genome wide screens (including whole exome sequencing).
COSMIC Mutations Data
A tab separated table of all COSMIC coding point mutations from targeted and genome wide screens from the current release.
The COSMIC Mutations Data set was chosen because it combines both the Targeted and Genome Screens
Downloaded File: COSMIC_SNPs_June_2022.tsv
NOTE Downloading the mutation datasets requires a COSMIC login. With an academic email address, an account can be created for free and the download can be performed.
Fields
The COSMIC dataset contains a large number of fields, many of which were filtered out in order to speed up processing in subsequent steps.
A ‘simplified’ version of the file was used by selecting specific columns from the orginal downloaded file using the command line tool awk
Fields in Simplified Version
Field Name |
Example |
|---|---|
Accession Number |
ENST00000404621.5 |
Sample name |
H_LV-3334-1316090 |
Primary site |
breast |
Mutation CDS |
c.644C>G |
Mutation AA |
p.S215* |
Mutation genome position |
12:124466234-124466234 |
All Fields from COSMIC and Field Descriptions
From ‘File Description’ drop down menu below ‘Cosmic Mutation Data’ (on downloads page)
Gene name |
The gene name for which the data has been curated in COSMIC. In most cases this is the accepted HGNC identifier. |
Accession Number |
The transcript identifier of the gene. |
Gene CDS length |
Length of the gene (base pair) counts. |
HGNC id |
if gene is in HGNC this id helps linking it to HGNC. |
Sample name |
Sample id Id tumour A sample is an instance of a portion of a tumour being examined for mutations. The sample name can be derived from a number of sources. In many cases it originates from the cell line name. Other sources include names assigned by the annotators or an incremented number assigned during an anonymisation process. A number of samples can be taken from a single tumour and a number of tumours can be obtained from one individual. A sample id is used to identify a sample within the COSMIC database. There can be multiple ids if the same sample has been entered into the database multiple times from different papers. |
Primary Site |
The primary tissue/cancer from which the sample originated. More details on the tissue classification are avaliable from here. In COSMIC we have standard classification system for tissue types and sub types because they vary a lot between different papers. |
Site Subtype 1 |
Further sub classification (level 1) of the samples tissue of origin. |
Site Subtype 2 |
Further sub classification (level 2) of the samples tissue of origin. |
Site Subtype 3 |
Further sub classification (level 3) of the samples tissue of origin. |
Primary Histology |
The histological classification of the sample. |
Histology Subtype 1 |
Further histological classification (level 1) of the sample. |
Histology Subtype 2 |
Further histological classification (level 2) of the sample. |
Histology Subtype 3 |
Further histological classification (level 3) of the sample. |
Genome-wide screen |
if the entire genome/exome is sequenced. |
GENOMIC_MUTATION_ID |
Genomic mutation identifier (COSV) to indicate the definitive position of the variant on the genome. This identifier is trackable and stable between different versions of the release. |
LEGACY_MUTATION_ID |
Legacy mutation identifier (COSM) that will represent existing COSM mutation identifiers. |
MUTATION_ID |
An internal mutation identifier to uniquely represent each mutation on a specific transcript on a given assembly build. |
Mutation CDS |
The change that has occurred in the nucleotide sequence. Formatting is identical to the method used for the peptide sequence. |
Mutation AA |
The change that has occurred in the peptide sequence. Formatting is based on the recommendations made by the Human Genome Variation Society. The description of each type can be found by following the link to Mutation Overview page. |
Mutation Description |
Type of mutation at the amino acid level (substitution deletion insertion complex fusion unknown etc.) |
Mutation zygosity |
Information on whether the mutation was reported to be homozygous heterozygous or unknown within the sample. |
LOH |
LOH Information on whether the gene was reported to have loss of heterozygosity in the sample: yes no or unknown. |
GRCh |
The coordinate system used: 37 = GRCh37/Hg19 and 38 = GRCh38/Hg38 |
Mutation genome position |
The genomic coordinates of the mutation. |
Mutation strand |
postive or negative. |
Resistance Mutation |
mutation confers drug resistance (see CosmicResistanceMutations.tsv.gz for gene/drug details). |
Mutation somatic status |
Information on whether the sample was reported to be Confirmed Somatic Previously Reported or Variant of unknown origin - |
^ |
Confirmed Somatic = if the mutation has been confimed to be somatic in the experiment by sequencing both the tumour and a matched normal from the same patient. |
^ |
Variant of unknown origin = when the mutation is known to be somatic but the tumour was sequenced without a matched normal. |
^ |
Previously observed = when the mutation has been reported as somatic previously but not in current paper. |
Pubmed_PMID |
The PUBMED ID for the paper that the sample was noted in linking to pubmed to provide more details of the publication. |
Id Study |
Lists the unique Ids of studies that have involved this sample. |
Sample Type |
Tumour origin Describes where the sample has originated from including the tumour type. |
Age |
Age of the sample (if this information is provided with the publications). |
HGVSP |
Human Genome Variation Society peptide syntax. |
HGVSC |
Human Genome Variation Society coding dna sequence syntax (CDS). |
HGVSG |
Human Genome Variation Society genomic syntax (3’ shifted). |