5.Python
Python Tutorial

Install Anaconda Python
URL: (https://www.anaconda.com/download/)
Easy install of data science packages (binary distribution)
Package management with
conda

Install Python packages using conda:
Update a package to the latest version:
Install Python packages using pip:
Update a package using pip:
Python language tips
Compatibility between Python 3.x and Python 2.x
Biggest difference: print is a function rather than statement in Python 3
This does not work in Python 3
Solution: use the __future__ module
Second biggest difference: some package/function names in the standard library are changed
Python 2 => Python 3
Solution: use the six module
Refer to (https://docs.python.org/3/library/__future__.html) for usage of the
__future__module.Refer to (https://pythonhosted.org/six/) for usage of the
sixmodule.
Get away from IndentationError
Python forces usage of tabs/spaces to indent code
Best practice: always use 4 spaces. You can set whether to use spaces(soft tabs) or tabs for indentation.
In vim editor, use :set list to inspect incorrect number of spaces/tabs.
Add Shebang and encoding at the beginning of executable scripts
Create a file named welcome.py
Then set the python script as executable:
Now you can run the script without specifying the Python interpreter:
All variables, functions, classes are dynamic objects
All python variables are pointers/references
Use deepcopy if you really want to COPY a variable
deepcopy if you really want to COPY a variableWhat if I accidentally overwrite my builtin functions?
You can refer to (https://docs.python.org/2/library/functions.html) for builtin functions in the standard library.
Note: in Python 3, you should import from builtins rather than __builtin__
int is of arbitrary precision in Python!
int is of arbitrary precision in Python!In Pyhton:
In R:
Easiest way to swap values of two variables
In C/C++:
In Python:
List comprehension
Use for-loops:
Use list comprehension
Dict comprehension
Use for-loops:
Use dict comprehension:
For the one-liners
Use ';' instead of '\n':
For more examples of one-liners, please refer to (https://wiki.python.org/moin/Powerful Python One-Liners).
Read from standard input
Order of dict keys are NOT as you expected
Use enumerate() to add a number during iteration
Reverse a list
Strings are immutable in Python
tuples are hashable while lists are not hashable
Use itertools
Nested loops in a more concise way:
Get all combinations of a list:
Convert iterables to lists
Use the zip() function to transpose nested lists/tuples/iterables
Global and local variables
Use defaultdict
Use dict:
Use defaultdict:
Use generators
Example: read a large FASTA file
Turn off annoying KeyboardInterrupt and BrokenPipe Error
Without exception handling (press Ctrl+C):
With exception handling (press Ctrl+C):
Class and instance variables
Useful Python packages for data analysis
Browser-based interactive programming in Python: jupyter
URL: (http://jupyter.org/)
Start jupyter notebook
Jupyter notebooks manager

Jupyter process manager

Jupyter notebook

Integrate with matplotlib

Browser-based text editor

Browser-based terminal

Display image

Display dataframe

Display audio

Embedded markdown

Python packages for scientific computing

Vector arithmetics: numpy
URL: (http://www.numpy.org/)
Example:
Numerical analysis (probability distribution, signal processing, etc.): scipy
URL: (https://www.scipy.org/)
scipy.stats contains a large number probability distributions: 
Unified interface for all probability distributions: ![scipy_stats_distribution]
Just-in-time (JIT) compiler for vector arithmetics
URL: (https://numba.pydata.org/)
Compile python for-loops to native code to achive similar performance to C/C++ code. Example:
Library for symbolic computation: sympy
URL: (http://www.sympy.org/en/index.html)

Operation on data frames: pandas
URL: (http://pandas.pydata.org/pandas-docs/stable/)
Example:
Basic graphics and plotting: matplotlib
URL: (https://matplotlib.org/contents.html)

Statistical data visualization: seaborn
URL: (https://seaborn.pydata.org/)

Interactive programming in Python: ipython
URL: (http://ipython.org/ipython-doc/stable/index.html)

Statistical tests: statsmodels
URL: (https://www.statsmodels.org/stable/index.html)

Machine learning algorithms: scikit-learn
URL: (http://scikit-learn.org/)

Example:
Natural language analysis: gensim
URL: (https://radimrehurek.com/gensim/)
HTTP library: requests
URL: (http://docs.python-requests.org/en/master/)

Lightweight Web framework: flask
URL: (http://flask.pocoo.org/)

Deep learning framework: tensorflow
URL: (http://tensorflow.org/)
High-level deep learning framework: keras
URL: (https://keras.io/)
Operation on sequence and alignment formats: biopython
URL: (http://biopython.org/)
Operation on genomic formats (BigWig,etc.): bx-python
Operation on HDF5 files: h5py
URL: (https://www.h5py.org/)
Save data to an HDF5 file
Read data from an HDF file:
Mixed C/C++ and python programming: cython
URL: (http://cython.org/)
Progress bar: tqdm
URL: (https://pypi.python.org/pypi/tqdm)


Example Python scripts
View a table in a pretty way
The original table is ugly:
Output:
Now display the table more clearly:
Output:
You can also get some help by typing tvi -h:
tvi.py
Generate a random FASTA file
seqgen.py
Weekly tasks
All files you need for completing the tasks can be found at: weekly_tasks.zip
Task 1: run examples (Python tips, numpy, pandas) in this tutorial
Install Anaconda on your PC. Try to understand example code and run in Jupyter or IPython.
Task 2: write a Python program to convert a GTF file to BED12 format
Please refer to (https://genome.ucsc.edu/FAQ/FAQformat.html#format1) for BED12 format and refer to (https://www.ensembl.org/info/website/upload/gff.html) for GTF format.
GTF example:
BED12 example:
The GTF file is
weekly_tasks/gencode.v27.long_noncoding_RNAs.gtf.Each line in the output file is a transcript with the 4th columns as transcript ID
The version number of the transcript ID should be stripped (e.g. ENST00000473358.1 => ENST00000473358).
The output file is sorted first by transcript IDs and then by chromosome in lexicographical order.
Column 5, 7, 8, 9 in the BED12 file should be set to 0.
Please do NOT use any external tools (e.g.
sort,awk, etc.) in your program other than Python.An example output can be found in
weekly_tasks/transcripts.bed.
Hint: use dict, list, tuple, str.split, re.match, sorted.
Task 3: write a Python program to add a prefix to all directories
Each prefix is a two-digit number starting from 00 and '-'. If the number is less than 10, a single '0' letter should be filled.
The files/directories should be numbered according to the lexicographical order.
For example, if the original directory structure is:
then you should get the following directory structure after renaming:
The original directories can be found in
weekly_tasks/original_dirs.The root directory (i.e.
original_dirs) should not be renamed.You can use
treecommand to display the directory structure as shown above.An example result can be found in
weekly_tasks/renamed_dirs.Hint: use
os.listdir,os.rename,str.format,sorted,yield.
Video
Last updated
Was this helpful?