Différences

Ci-dessous, les différences entre deux révisions de la page.

--- teaching:progappchim:bioinformatic [2016/03/15 11:54] – créée villersd
+++ teaching:progappchim:bioinformatic [2022/09/22 16:59] (Version actuelle) – [Références] villersd
@@ Ligne 1: / Ligne 1: @@
 ====== Bioinformatique ======
-Manipulations de séquences ADN, ARN, protéines,...
+Un des objectifs majeurs de la [[wp>fr:Bio-informatique|bioinformatique]] réside dans l'étude automatique de séquences, principalement de l'ADN et de protéines,...
+Ces séquences sont accessibles librement et publiquement, notamment par ces deux sources :
+{{wp>fr:UniProt}}
+Voir aussi le site [[https://www.uniprot.org/]]
+{{wp>fr:GenBank}}
+Voir aussi le site [[https://www.ncbi.nlm.nih.gov/genbank/]]
+===== Installer Biopython =====
+[[https://biopython.org/|Biopython]] est une librairie de programmes en langage Python dédiée à l'étude de séquences (ADN, ARN, protéines). Pour utiliser cette librairie, elle doit-être installée au préalable, par exemple :
+  * Avec la distribution Anaconda, via l'interface Anaconda-Navigator, au départ du canal "conda-forge' ou par la commande suivante : conda install -c conda-forge biopython
+  * via le site Pypi (pypi.org) et la commande suivante : pip install biopython
 ===== Compter les nucléotides d'une séquence ADN =====
-<sxh python; title : Counting_DNA_Nucleotides-01.py>
+<code python Counting_DNA_Nucleotides-01.py>
 #!/usr/bin/env python
 # -*- coding: utf-8 -*-
@@ Ligne 14: / Ligne 32: @@
 # utilisation d'une liste et de la méthode .count()
-bases=["A","C","G","T"]
+bases = ["A","C","G","T"]
 for base in bases:
-    print adn.count(base),
+    print(adn.count(base),)
-print
+print()
 # Variante :
 for c in 'ACGT':
-    print adn.count(c),
+    print(adn.count(c),)
-print
+print()
 # variante un peu moins lisible
@@ Ligne 31: / Ligne 49: @@
 # utilisation de la technique "list comprehension"
-count=[adn.count(c) for c in 'ACGT']
+count = [adn.count(c) for c in 'ACGT']
 for val in count:
-    print val,
+    print(val,)
-print
+print()
 # autre "list comprehension", avec impression formatée → version "one line"
-print "%d %d %d %d" % tuple([adn.count(X) for X in "ACGT"])
+print("%d %d %d %d" % tuple([adn.count(X) for X in "ACGT"]))
 # count "à la main", sans utilisation de fonctions/librairie
@@ Ligne 47: / Ligne 65: @@
             count[i] +=1
 for val in count:
-    print val,
+    print(val,)
-print
+print()
 # count "à la main", avec .index()
@@ Ligne 56: / Ligne 74: @@
     count[ACGT.index(c)] += 1
 for val in count:
-    print val,
+    print(val,)
-print
+print()
 # utilisation de la librairie collections
@@ Ligne 64: / Ligne 82: @@
 for c in adn:
     ncount[c] += 1
-print ncount['A'], ncount['C'], ncount['G'], ncount['T']
+print(ncount['A'], ncount['C'], ncount['G'], ncount['T'])
 # collections.Counter
 from collections import Counter
-for k,v in sorted(Counter(adn).items()): print v,
+for k,v in sorted(Counter(adn).items()):
-print
+    print(v,)
+print()
 # avec un dictionnaire
@@ Ligne 75: / Ligne 94: @@
 for c in adn:
     freq[c] += 1
-print freq['A'], freq['C'], freq['G'], freq['T']
+print(freq['A'], freq['C'], freq['G'], freq['T'])
 # avec un dictionnaire et count(), impression différente
 dico={}
 for base in bases:
-    dico[base]=adn.count(base)
+    dico[base] = adn.count(base)
 for key,val in dico.items():
-    print "{} = {}".format(key, val)
+    print("{} = {}".format(key, val))
-</sxh>
+</code>
+===== Trouver un motif =====
++ lecture de fichier
+<code python Finding_a_Protein_Motif-01.py>
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+"""
+La description complète et les caractéristiques d'une protéine particulière peuvent être obtenues via l'ID "uniprot_id" de la "UniProt database", en insérant la référence dans ce lien :
+http://www.uniprot.org/uniprot/uniprot_id
+On peut aussi obtenir la séquence peptidique au format FASTA via le lien :
+http://www.uniprot.org/uniprot/uniprot_id.fasta
+"""
+from Bio import SeqIO
+from Bio import ExPASy
+from Bio import SeqIO
+dic = {"UUU":"F", "UUC":"F", "UUA":"L", "UUG":"L",
+    "UCU":"S", "UCC":"S", "UCA":"S", "UCG":"S",
+    "UAU":"Y", "UAC":"Y", "UAA":"STOP", "UAG":"STOP",
+    "UGU":"C", "UGC":"C", "UGA":"STOP", "UGG":"W",
+    "CUU":"L", "CUC":"L", "CUA":"L", "CUG":"L",
+    "CCU":"P", "CCC":"P", "CCA":"P", "CCG":"P",
+    "CAU":"H", "CAC":"H", "CAA":"Q", "CAG":"Q",
+    "CGU":"R", "CGC":"R", "CGA":"R", "CGG":"R",
+    "AUU":"I", "AUC":"I", "AUA":"I", "AUG":"M",
+    "ACU":"T", "ACC":"T", "ACA":"T", "ACG":"T",
+    "AAU":"N", "AAC":"N", "AAA":"K", "AAG":"K",
+    "AGU":"S", "AGC":"S", "AGA":"R", "AGG":"R",
+    "GUU":"V", "GUC":"V", "GUA":"V", "GUG":"V",
+    "GCU":"A", "GCC":"A", "GCA":"A", "GCG":"A",
+    "GAU":"D", "GAC":"D", "GAA":"E", "GAG":"E",
+    "GGU":"G", "GGC":"G", "GGA":"G", "GGG":"G",}
+aminoacids = ''.join(sorted(list(set([v for k,v in dic.items() if v != "STOP"]))))
+print(aminoacids)
+# UniProt Protein Database access IDs
+proteins = ['A2Z669', 'B5ZC00', 'P07204_TRBM_HUMAN', 'P20840_SAG1_YEAST']
+handle = ExPASy.get_sprot_raw(proteins[0])
+seq_record = SeqIO.read(handle, "swiss")
+handle.close()
+print()
+print(seq_record)
+</code>
 ===== Références =====
-  * [[http://www.scienceinschool.org/2014/issue29/online_bioinf|Using biological databases to teach evolution and biochemistry]]
+  * [[http://biopython.org/wiki/Main_Page|Biopython]] (librairie python de bioinformatique)
-  * [[http://rosalind.info/|Rosalind]], plateforme d'apprentissage de la programmation en bioinformatique
-  * [[http://www.ncbi.nlm.nih.gov/genbank/|GenBank]]
-  * [[http://biopython.org/wiki/Main_Page|Biopython]]
   * [[https://en.wikipedia.org/wiki/Bioinformatics]]
   * [[https://en.wikipedia.org/wiki/Open_Bioinformatics_Foundation]]
   * [[https://en.wikipedia.org/wiki/FASTA_format]]
   * [[https://en.wikipedia.org/wiki/List_of_open-source_bioinformatics_software]]
-  * [[http://www.amberbiology.com/]], "Python For The Life Sciences. A gentle introduction to Python for life scientists" (à paraître)
+  * cours introductif sur biopython :
+    * [[https://bioinformaticscore.sites.vib.be/en|VIB bioinformatics core]], en particulier [[https://data.bits.vib.be/pub/trainingen/Biopython/Basics_of_Biopython_1.1.pdf|ce tutoriel]]
+  * [[https://www.bioinformaticsalgorithms.org/|Bioinformatics Algorithms]]
+  * Articles de la revue "Science in School" :
+    * [[https://www.scienceinschool.org/2010/issue17/bioinformatics|Bioinformatics with pen and paper: building a phylogenetic tree]] Cleopatra Kozlowski, 07/12/2010
+    * [[https://www.scienceinschool.org/2014/issue29/online_bioinf|Using biological databases to teach evolution and biochemistry]], Germán Tenorio, 02/06/2014
+  * documentation sur les arbres phylogénétiques : [[https://biopython.org/wiki/Phylo]]
+  * [[http://rosalind.info/|Rosalind]], plateforme d'apprentissage de la programmation en bioinformatique
+    * [[http://rosalind.info/glossary/|Glossaire de bioinformatique]]
+  * [[https://stepik.org/catalog?language=en&q=bioinformatics|Catalog – Stepik]] cours et challenges en programmation, avec des activités en bioinformatique
+    * [[https://stepik.org/course/2/promo|Bioinformatics Algorithms – Stepik]] (cours introductif)
+    * [[https://stepik.org/org/bioinf|Bioinformatics Institute – Stepik]] ("institut virtuel" russe sur l'apprentissage de la bioinformatique)
+    * [[https://stepik.org/course/945|Bioinformatics Contest 2017 – Stepik]] concours de programmation 2017
+    * [[https://stepik.org/course/4377/promo|Bioinformatics Contest 2018 – Stepik]] concours de programmation 2018
+    * [[https://stepik.org/course/43615/promo|Bioinformatics Contest 2019 – Stepik]]  concours de programmation 2019
+  * [[http://www.amberbiology.com/]] & [[https://pythonforthelifesciences.com/|Python for the Life Sciences – A gentle introduction to Python for life scientists]] programmation privilégiant les modules standards de Python (pas le module biopython par exemple)
+  * [[https://www.packtpub.com/eu/application-development/bioinformatics-python-cookbook|Bioinformatics with Python Cookbook]] livre utilisant beaucoup la librairie biopython
+  * [[http://www.ncbi.nlm.nih.gov/genbank/|GenBank]]
+  * références sur la lecture de fichiers :
+    * [[http://www.uniprot.org/help/programmatic_access#id_mapping_python_example]]
+    * [[http://www.python-simple.com/python-biopython/Lecture-ecriture-sequences.php]]
+  * données exemples dans le cadre de la COVID-19 :
+    * [[https://www.uniprot.org/uniprot/P0DTC2|S - Spike glycoprotein precursor - Severe acute respiratory syndrome coronavirus 2 (2019-nCoV) - S gene & protein]]