博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
03、重定义CDF
阅读量:5055 次
发布时间:2019-06-12

本文共 6866 字,大约阅读时间需要 22 分钟。

  有一篇文章“Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data”,提到了“Our results demonstrate that the original Affymetrix probe set definitions are inaccurate, and many conclusions derived from past GeneChip analyses 40 may be significantly flawed. It will be beneficial to re-analyze existing GeneChip data with updated probe set definitions”,意思是说原始的Affymetrix探针组定义是不准确、过时的,要重定义探针组(Custom CDF)。

 

重定义的格式包括:

名称

含义

ENTREZG

 

REFSEQ

 

ENSG

 

ENSE

 

ENST

 

VEGAG

 

VEGAE

 

VEGAT

 

TAIRG

 

TAIRT

 

UG

 

MIRBASEF

 

MIRBASEG

 

 

 

物种包括:

名称

含义

简写

Anopheles gambiae

冈比亚疟蚊

Ag

Arabidopsis thaliana

拟南芥

At

Bos taurus

Bt

Caenorhabditis elegans

秀丽隐杆线虫

Ce

Canis familiaris

犬类

Cf

Danio rerio

鲐鱼类

Dr

Drosophila melanogaster

黑腹果蝇

Dm

Gallus gallus

原鸡

Gg

Homo sapiens

人类

Hs

Macaca mulatta

猕猴

MAmu

Mus musculus

小家鼠

Mm

Oryza sativa

Os

Rattus norvegicus

鼠类

Rn

Saccharomyces cerevisiae

酿酒酵母

Sc

Schizosaccharomyces pombe

粟酒裂殖酵母

Sp

Sus scrofa

野猪

Ss

 

在“02CDF文件”中提到过,每种型号的芯片都有着对应CDF包,那么重定义后CDF的名称命名如下(加号代表连接,不要写进去):

CDF包名+物种简写小写+格式名小写

 

HG-U133_Plus_2阵列原本对应hgu133plus2包,如果选择了Homo sapiens(人类物种),ENSG格式,那么就对应hgu133plus2hsensgcdf包了。

 

获取hgu133plus2的探针组名称,可以:

 

library(affy)     ## 导入affy包cdfname <- "hgu133plus2"cdfname <-"hgu133plus2hsensgcdf"how = getOption("BioC")$affy$probeslocverbose = FALSEbadOut <- list()for (i in 1:length(how)) {    cur <- how[[i]]        envir <- switch(cur$what,         environment = cdfFromEnvironment(cdfname, cur$where, verbose),         libPath = cdfFromLibPath(cdfname, cur$where, verbose = verbose),         bioC = cdfFromBioC(cdfname, cur$where, verbose))}genenames <- ls(envir)        ## 探针组名称> length(genenames)        ## 探针组个数[1] 54675> genenames[1:100]        ## 输出前100个探针组名称  [1] "1007_s_at"    "1053_at"      "117_at"       "121_at"        [5] "1255_g_at"    "1294_at"      "1316_at"      "1320_at"       [9] "1405_i_at"    "1431_at"      "1438_at"      "1487_at"      [13] "1494_f_at"    "1552256_a_at" "1552257_a_at" "1552258_at"   [17] "1552261_at"   "1552263_at"   "1552264_a_at" "1552266_at"   [21] "1552269_at"   "1552271_at"   "1552272_a_at" "1552274_at"   [25] "1552275_s_at" "1552276_a_at" "1552277_a_at" "1552278_a_at" [29] "1552279_a_at" "1552280_at"   "1552281_at"   "1552283_s_at" [33] "1552286_at"   "1552287_s_at" "1552288_at"   "1552289_a_at" [37] "1552291_at"   "1552293_at"   "1552295_a_at" "1552296_at"   [41] "1552299_at"   "1552301_a_at" "1552302_at"   "1552303_a_at" [45] "1552304_at"   "1552306_at"   "1552307_a_at" "1552309_a_at" [49] "1552310_at"   "1552311_a_at" "1552312_a_at" "1552314_a_at" [53] "1552315_at"   "1552316_a_at" "1552318_at"   "1552319_a_at" [57] "1552320_a_at" "1552321_a_at" "1552322_at"   "1552323_s_at" [61] "1552325_at"   "1552326_a_at" "1552327_at"   "1552329_at"   [65] "1552330_at"   "1552332_at"   "1552334_at"   "1552335_at"   [69] "1552337_s_at" "1552338_at"   "1552340_at"   "1552343_s_at" [73] "1552344_s_at" "1552347_at"   "1552348_at"   "1552349_a_at" [77] "1552354_at"   "1552355_s_at" "1552359_at"   "1552360_a_at" [81] "1552362_a_at" "1552364_s_at" "1552365_at"   "1552367_a_at" [85] "1552368_at"   "1552370_at"   "1552372_at"   "1552373_s_at" [89] "1552375_at"   "1552377_s_at" "1552378_s_at" "1552379_at"   [93] "1552381_at"   "1552383_at"   "1552384_a_at" "1552386_at"   [97] "1552388_at"   "1552389_at"   "1552390_a_at" "1552391_at"

 

 

获取hgu133aplus2_Hs_ENSG的探针组名称,把上面的cdfname换成"hgu133plus2hsensgcdf"即可,最后得到以下结果:

> length(genenames)    ## 重定义后的探针组数目[1] 20009> genenames[1:100]        ## 重定义后的探针组名称的前100个  [1] "AFFX-BioB-3_at"              "AFFX-BioB-5_at"               [3] "AFFX-BioB-M_at"              "AFFX-BioC-3_at"               [5] "AFFX-BioC-5_at"              "AFFX-BioDn-3_at"              [7] "AFFX-BioDn-5_at"             "AFFX-CreX-3_at"               [9] "AFFX-CreX-5_at"              "AFFX-DapX-3_at"              [11] "AFFX-DapX-5_at"              "AFFX-DapX-M_at"              [13] "AFFX-HSAC07/X00351_3_at"     "AFFX-HSAC07/X00351_5_at"     [15] "AFFX-HSAC07/X00351_M_at"     "AFFX-hum_alu_at"             [17] "AFFX-HUMGAPDH/M33197_3_at"   "AFFX-HUMGAPDH/M33197_5_at"   [19] "AFFX-HUMGAPDH/M33197_M_at"   "AFFX-HUMISGF3A/M97935_3_at"  [21] "AFFX-HUMISGF3A/M97935_5_at"  "AFFX-HUMISGF3A/M97935_MA_at" [23] "AFFX-HUMISGF3A/M97935_MB_at" "AFFX-HUMRGE/M10098_3_at"     [25] "AFFX-HUMRGE/M10098_5_at"     "AFFX-HUMRGE/M10098_M_at"     [27] "AFFX-LysX-3_at"              "AFFX-LysX-5_at"              [29] "AFFX-LysX-M_at"              "AFFX-M27830_3_at"            [31] "AFFX-M27830_5_at"            "AFFX-M27830_M_at"            [33] "AFFX-PheX-3_at"              "AFFX-PheX-5_at"              [35] "AFFX-PheX-M_at"              "AFFX-r2-Bs-dap-3_at"         [37] "AFFX-r2-Bs-dap-5_at"         "AFFX-r2-Bs-dap-M_at"         [39] "AFFX-r2-Bs-lys-3_at"         "AFFX-r2-Bs-lys-5_at"         [41] "AFFX-r2-Bs-lys-M_at"         "AFFX-r2-Bs-phe-3_at"         [43] "AFFX-r2-Bs-phe-5_at"         "AFFX-r2-Bs-phe-M_at"         [45] "AFFX-r2-Bs-thr-3_s_at"       "AFFX-r2-Bs-thr-5_s_at"       [47] "AFFX-r2-Bs-thr-M_s_at"       "AFFX-r2-Ec-bioB-3_at"        [49] "AFFX-r2-Ec-bioB-5_at"        "AFFX-r2-Ec-bioB-M_at"        [51] "AFFX-r2-Ec-bioC-3_at"        "AFFX-r2-Ec-bioC-5_at"        [53] "AFFX-r2-Ec-bioD-3_at"        "AFFX-r2-Ec-bioD-5_at"        [55] "AFFX-r2-P1-cre-3_at"         "AFFX-r2-P1-cre-5_at"         [57] "AFFX-ThrX-3_at"              "AFFX-ThrX-5_at"              [59] "AFFX-ThrX-M_at"              "AFFX-TrpnX-3_at"             [61] "AFFX-TrpnX-5_at"             "AFFX-TrpnX-M_at"             [63] "ENSG00000000003_at"          "ENSG00000000005_at"          [65] "ENSG00000000419_at"          "ENSG00000000457_at"          [67] "ENSG00000000460_at"          "ENSG00000000938_at"          [69] "ENSG00000000971_at"          "ENSG00000001036_at"          [71] "ENSG00000001084_at"          "ENSG00000001167_at"          [73] "ENSG00000001460_at"          "ENSG00000001461_at"          [75] "ENSG00000001497_at"          "ENSG00000001561_at"          [77] "ENSG00000001617_at"          "ENSG00000001626_at"          [79] "ENSG00000001629_at"          "ENSG00000001631_at"          [81] "ENSG00000002016_at"          "ENSG00000002079_at"          [83] "ENSG00000002330_at"          "ENSG00000002549_at"          [85] "ENSG00000002586_at"          "ENSG00000002587_at"          [87] "ENSG00000002726_at"          "ENSG00000002745_at"          [89] "ENSG00000002746_at"          "ENSG00000002822_at"          [91] "ENSG00000002834_at"          "ENSG00000002919_at"          [93] "ENSG00000002933_at"          "ENSG00000003056_at"          [95] "ENSG00000003096_at"          "ENSG00000003137_at"          [97] "ENSG00000003147_at"          "ENSG00000003249_at"          [99] "ENSG00000003393_at"          "ENSG00000003400_at"

 

从结果可以看出,hgu133plus254675个探针组,而hgu133plus2hsensgcdf只有20009个探针组,这是因为有些探针组被合并起来了,可能还有一些被舍弃掉。

 

转载于:https://www.cnblogs.com/xianwen/p/4522202.html

你可能感兴趣的文章
iOS 8 地图
查看>>
20165235 第八周课下补做
查看>>
[leetcode] 1. Two Sum
查看>>
iOS 日常工作之常用宏定义大全
查看>>
PHP的SQL注入技术实现以及预防措施
查看>>
MVC Razor
查看>>
软件目录结构规范
查看>>
Windbg调试Sql Server 进程
查看>>
linux调度器系列
查看>>
mysqladmin
查看>>
解决 No Entity Framework provider found for the ADO.NET provider
查看>>
SVN服务器搭建和使用(三)(转载)
查看>>
Android 自定义View (三) 圆环交替 等待效果
查看>>
设置虚拟机虚拟机中fedora上网配置-bridge连接方式(图解)
查看>>
HEVC播放器出炉,迅雷看看支持H.265
查看>>
[置顶] Android仿人人客户端(v5.7.1)——人人授权访问界面
查看>>
Eclipse 调试的时候Tomcat报错启动不了
查看>>
【安卓5】高级控件——拖动条SeekBar
查看>>
ES6内置方法find 和 filter的区别在哪
查看>>
Android入门之文件系统操作(二)文件操作相关指令
查看>>