生物抗冻蛋白聚类练习:
1.统计每个样本(蛋白序列)中单个氨基酸的百分含量,作为样本的特征;
2.用绝对值距离计算样本间距离;
3.利用各种据类方法进行聚类最长距离法;
4.试解释聚类结果。
说明:
">"一行是说明行,最后是物种,需要同学去查中文名称及物种分类,以便解释聚类结果。
>Q56TU0|AFP4_GADMO Type-4 ice-structuring protein - Gadus morhua (Atlantic cod).
MKYTLIAAIVVLALAQGTLAVEQSPELEKMAQFFEGMKTELMATVQKVSESLQSQTIIED
GRTQLEPIMTQIQEHLAPLATSVQEKVTPLAEDMQQKLKPYVDEFQSELESVLRKLLDQA
KAITQ
>P80961|AFP4_MYOOC Type-4 ice-structuring protein LS-12 - Myoxocephalus octodecimspinosis (Longhorn sculpin).
MKFSLVATIVLLALAQGSFAQGAADLESLGQYFEEMKTKLIQDMTEIIRSQDLANQAQAF
VEDKKTQLQPLVAQIQEQMKTVATNVEEQIRPLTANVQAHLQPQIDNFQKQMEAIIKKLT
DQTMAIEN
>Q8JI37|AFP4_PAROL Type-4 ice-structuring protein - Paralichthys olivaceus (Japanese flounder).
MKFSLIAAVALLALAQGSFAQDAADLEKITQYFENLKNKMTEDVTAFLTNQDVANQAQTF
MQERKTQLEPLATQIQEQLRAAATKFEEHITPLAANVQPVVENFQQQMEALVQKLMEKTR
SISN
>P19613|ANP11_MACAM Ice-structuring protein SP2(HPLC 11) - Macrozoarces americanus (Ocean pout) (Zoarces americanus).
SVVATQLIPINTALTPAMMEGKVTNPIGIPFAEMSQIVGKQVNRIVAKGQTLMPNMVKTY
AA
>P19614|ANP12_MACAM Type-3 ice-structuring protein HPLC 12 - Macrozoarces americanus (Ocean pout) (Zoarces americanus).
NQASVVANQLIPINTALTLVMMRSEVVTPVGIPAEDIPRLVSMQVNRAVPLGTTLMPDMV
KGYPPA
>P07457|ANP1C_MACAM Ice-structuring protein SP1-C - Macrozoarces americanus (Ocean pout) (Zoarces americanus).
MKSVILTGLLFVLLCVDHMTASQSVVATQLIPINTALTPAMMEGKVTNPIGIPFAEMSQI
VGKQVNTPVAKGQTLMPNMVKTYVAGK
>P12416|ANP1_ANALU Type-3 ice-structuring protein 1.9 - Anarhichas lupus (Atlantic wolffish).
MKSAILTGLLFVLLCVDHLSSASQSVVATQLIPINTALTPIMMKGQVVNPAGIPFAEMSQ
IVGKQVNRPVAKDETLMPNMVKTYRAAK
>P24028|ANP1_LYCPO Ice-structuring protein LP - Lycodes polaris (Canadian eelpout).
NKASVVANQLIPINTALTLVMMRAEVVTPAGIPAEDIPRLVGLQVNRAVLIGTTLMPDMV
KGYAPQ
>P19608|ANP1_MACAM Ice-structuring protein SP2(HPLC 1) - Macrozoarces americanus (Ocean pout) (Zoarces americanus).
SQSVVATQLIPMNTALTPVMMEGKVTNPIGIPFAEMSQIVGKQVNTPVAKGQTIMPNMVK
TYAA
>P12100|ANP1_PACBR Ice-structuring protein AB1 - Pachycara brachycephalum (Antarctic eelpout) (Austrolycichthys brachycephalus).
TKSVVASQLIPINTALTPAMMKAKEVSPKGIPAEEMSKIVGMQVNRAVNLDETLMPDMVK
TYQ
>P35751|ANP1_RHIDE Ice-structuring protein RD1 - Rhigophila dearborni (Antarctic eelpout) (Lycodichthys dearborni).
NKASVVANQLIPINTALTLIMMKAEVVTPMGIPAEEIPKLVGMQVNRAVPLGTTLMPDMV
KNYE
>P12417|ANP2_ANALU Type-3 ice-structuring protein 1.5 - Anarhichas lupus (Atlantic wolffish).
MKSAILTGLLFVLLCVDHMSSASQSVVATQLIPINTALTPIMMKGQVVNPAGIPFAEMSQ
IVGKQVNRAVAKDETLMPNMVKTYRAAK
>P12101|ANP2_PACBR Ice-structuring protein AB2 - Pachycara brachycephalum (Antarctic eelpout) (Austrolycichthys brachycephalus).
TKSVVANQLIPINTALTLVMMKAEEVSPKGIPAEEIPRLVGMQVNRAVYLDETLMPDMVK
NYE
>P12102|ANP2_RHIDE Ice-structuring protein RD2 - Rhigophila dearborni (Antarctic eelpout) (Lycodichthys dearborni).
NKASVVANQLIPINTALTLIMMKAEVVTPMGIPAEDIPRIIGMQVNRAVPLGTTLMPDMV
KNYE
>P19606|ANP3_MACAM Ice-structuring protein lambda OP-3 - Macrozoarces americanus (Ocean pout) (Zoarces americanus).
1.统计每个样本(蛋白序列)中单个氨基酸的百分含量,作为样本的特征;
2.用绝对值距离计算样本间距离;
3.利用各种据类方法进行聚类最长距离法;
4.试解释聚类结果。
说明:
">"一行是说明行,最后是物种,需要同学去查中文名称及物种分类,以便解释聚类结果。
>Q56TU0|AFP4_GADMO Type-4 ice-structuring protein - Gadus morhua (Atlantic cod).
MKYTLIAAIVVLALAQGTLAVEQSPELEKMAQFFEGMKTELMATVQKVSESLQSQTIIED
GRTQLEPIMTQIQEHLAPLATSVQEKVTPLAEDMQQKLKPYVDEFQSELESVLRKLLDQA
KAITQ
>P80961|AFP4_MYOOC Type-4 ice-structuring protein LS-12 - Myoxocephalus octodecimspinosis (Longhorn sculpin).
MKFSLVATIVLLALAQGSFAQGAADLESLGQYFEEMKTKLIQDMTEIIRSQDLANQAQAF
VEDKKTQLQPLVAQIQEQMKTVATNVEEQIRPLTANVQAHLQPQIDNFQKQMEAIIKKLT
DQTMAIEN
>Q8JI37|AFP4_PAROL Type-4 ice-structuring protein - Paralichthys olivaceus (Japanese flounder).
MKFSLIAAVALLALAQGSFAQDAADLEKITQYFENLKNKMTEDVTAFLTNQDVANQAQTF
MQERKTQLEPLATQIQEQLRAAATKFEEHITPLAANVQPVVENFQQQMEALVQKLMEKTR
SISN
>P19613|ANP11_MACAM Ice-structuring protein SP2(HPLC 11) - Macrozoarces americanus (Ocean pout) (Zoarces americanus).
SVVATQLIPINTALTPAMMEGKVTNPIGIPFAEMSQIVGKQVNRIVAKGQTLMPNMVKTY
AA
>P19614|ANP12_MACAM Type-3 ice-structuring protein HPLC 12 - Macrozoarces americanus (Ocean pout) (Zoarces americanus).
NQASVVANQLIPINTALTLVMMRSEVVTPVGIPAEDIPRLVSMQVNRAVPLGTTLMPDMV
KGYPPA
>P07457|ANP1C_MACAM Ice-structuring protein SP1-C - Macrozoarces americanus (Ocean pout) (Zoarces americanus).
MKSVILTGLLFVLLCVDHMTASQSVVATQLIPINTALTPAMMEGKVTNPIGIPFAEMSQI
VGKQVNTPVAKGQTLMPNMVKTYVAGK
>P12416|ANP1_ANALU Type-3 ice-structuring protein 1.9 - Anarhichas lupus (Atlantic wolffish).
MKSAILTGLLFVLLCVDHLSSASQSVVATQLIPINTALTPIMMKGQVVNPAGIPFAEMSQ
IVGKQVNRPVAKDETLMPNMVKTYRAAK
>P24028|ANP1_LYCPO Ice-structuring protein LP - Lycodes polaris (Canadian eelpout).
NKASVVANQLIPINTALTLVMMRAEVVTPAGIPAEDIPRLVGLQVNRAVLIGTTLMPDMV
KGYAPQ
>P19608|ANP1_MACAM Ice-structuring protein SP2(HPLC 1) - Macrozoarces americanus (Ocean pout) (Zoarces americanus).
SQSVVATQLIPMNTALTPVMMEGKVTNPIGIPFAEMSQIVGKQVNTPVAKGQTIMPNMVK
TYAA
>P12100|ANP1_PACBR Ice-structuring protein AB1 - Pachycara brachycephalum (Antarctic eelpout) (Austrolycichthys brachycephalus).
TKSVVASQLIPINTALTPAMMKAKEVSPKGIPAEEMSKIVGMQVNRAVNLDETLMPDMVK
TYQ
>P35751|ANP1_RHIDE Ice-structuring protein RD1 - Rhigophila dearborni (Antarctic eelpout) (Lycodichthys dearborni).
NKASVVANQLIPINTALTLIMMKAEVVTPMGIPAEEIPKLVGMQVNRAVPLGTTLMPDMV
KNYE
>P12417|ANP2_ANALU Type-3 ice-structuring protein 1.5 - Anarhichas lupus (Atlantic wolffish).
MKSAILTGLLFVLLCVDHMSSASQSVVATQLIPINTALTPIMMKGQVVNPAGIPFAEMSQ
IVGKQVNRAVAKDETLMPNMVKTYRAAK
>P12101|ANP2_PACBR Ice-structuring protein AB2 - Pachycara brachycephalum (Antarctic eelpout) (Austrolycichthys brachycephalus).
TKSVVANQLIPINTALTLVMMKAEEVSPKGIPAEEIPRLVGMQVNRAVYLDETLMPDMVK
NYE
>P12102|ANP2_RHIDE Ice-structuring protein RD2 - Rhigophila dearborni (Antarctic eelpout) (Lycodichthys dearborni).
NKASVVANQLIPINTALTLIMMKAEVVTPMGIPAEDIPRIIGMQVNRAVPLGTTLMPDMV
KNYE
>P19606|ANP3_MACAM Ice-structuring protein lambda OP-3 - Macrozoarces americanus (Ocean pout) (Zoarces americanus).