5.Python

Python Tutorial

life_is_short

Install Anaconda Python

URL: (https://www.anaconda.com/download/)

  • Easy install of data science packages (binary distribution)

  • Package management with conda

anaconda_packages

Install Python packages using conda:

Update a package to the latest version:

Install Python packages using pip:

Update a package using pip:

Python language tips

Compatibility between Python 3.x and Python 2.x

Biggest difference: print is a function rather than statement in Python 3

This does not work in Python 3

Solution: use the __future__ module

Second biggest difference: some package/function names in the standard library are changed

Python 2 => Python 3

Solution: use the six module

Get away from IndentationError

Python forces usage of tabs/spaces to indent code

Best practice: always use 4 spaces. You can set whether to use spaces(soft tabs) or tabs for indentation.

In vim editor, use :set list to inspect incorrect number of spaces/tabs.

Add Shebang and encoding at the beginning of executable scripts

Create a file named welcome.py

Then set the python script as executable:

Now you can run the script without specifying the Python interpreter:

All variables, functions, classes are dynamic objects

All python variables are pointers/references

Use deepcopy if you really want to COPY a variable

What if I accidentally overwrite my builtin functions?

You can refer to (https://docs.python.org/2/library/functions.html) for builtin functions in the standard library.

Note: in Python 3, you should import from builtins rather than __builtin__

int is of arbitrary precision in Python!

In Pyhton:

In R:

Easiest way to swap values of two variables

In C/C++:

In Python:

List comprehension

Use for-loops:

Use list comprehension

Dict comprehension

Use for-loops:

Use dict comprehension:

For the one-liners

Use ';' instead of '\n':

For more examples of one-liners, please refer to (https://wiki.python.org/moin/Powerful Python One-Liners).

Read from standard input

Order of dict keys are NOT as you expected

Use enumerate() to add a number during iteration

Reverse a list

Strings are immutable in Python

tuples are hashable while lists are not hashable

Use itertools

Nested loops in a more concise way:

Get all combinations of a list:

Convert iterables to lists

Use the zip() function to transpose nested lists/tuples/iterables

Global and local variables

Use defaultdict

Use dict:

Use defaultdict:

Use generators

Example: read a large FASTA file

Turn off annoying KeyboardInterrupt and BrokenPipe Error

Without exception handling (press Ctrl+C):

With exception handling (press Ctrl+C):

Class and instance variables

Useful Python packages for data analysis

Browser-based interactive programming in Python: jupyter

URL: (http://jupyter.org/)

Start jupyter notebook

Jupyter notebooks manager

jupyter_main

Jupyter process manager

jupyter_processes

Jupyter notebook

jupyter_notebook

Integrate with matplotlib

jupyter_matplotlib

Browser-based text editor

jupyter_text_editor

Browser-based terminal

jupyter_termimal

Display image

jupyter_image

Display dataframe

jupyter_dataframe

Display audio

jupyter_audio

Embedded markdown

jupyter_markdown

Python packages for scientific computing

scipy ecosystem

Vector arithmetics: numpy

URL: (http://www.numpy.org/)

Example:

Numerical analysis (probability distribution, signal processing, etc.): scipy

URL: (https://www.scipy.org/)

scipy.stats contains a large number probability distributions: scipy_stats

Unified interface for all probability distributions: ![scipy_stats_distribution]

Just-in-time (JIT) compiler for vector arithmetics

URL: (https://numba.pydata.org/)

Compile python for-loops to native code to achive similar performance to C/C++ code. Example:

Library for symbolic computation: sympy

URL: (http://www.sympy.org/en/index.html)

sympy

Operation on data frames: pandas

URL: (http://pandas.pydata.org/pandas-docs/stable/)

Example:

Basic graphics and plotting: matplotlib

URL: (https://matplotlib.org/contents.html)

matplotlib

Statistical data visualization: seaborn

URL: (https://seaborn.pydata.org/)

seaborn

Interactive programming in Python: ipython

URL: (http://ipython.org/ipython-doc/stable/index.html)

ipython

Statistical tests: statsmodels

URL: (https://www.statsmodels.org/stable/index.html)

statsmodels

Machine learning algorithms: scikit-learn

URL: (http://scikit-learn.org/)

scikit-learn

Example:

Natural language analysis: gensim

URL: (https://radimrehurek.com/gensim/)

HTTP library: requests

URL: (http://docs.python-requests.org/en/master/)

requests

Lightweight Web framework: flask

URL: (http://flask.pocoo.org/)

flask

Deep learning framework: tensorflow

URL: (http://tensorflow.org/)

High-level deep learning framework: keras

URL: (https://keras.io/)

Operation on sequence and alignment formats: biopython

URL: (http://biopython.org/)

Operation on genomic formats (BigWig,etc.): bx-python

Operation on HDF5 files: h5py

URL: (https://www.h5py.org/)

Save data to an HDF5 file

Read data from an HDF file:

Mixed C/C++ and python programming: cython

URL: (http://cython.org/)

Progress bar: tqdm

URL: (https://pypi.python.org/pypi/tqdm)

tqdm
tqdm_notebook

Example Python scripts

View a table in a pretty way

The original table is ugly:

Output:

Now display the table more clearly:

Output:

You can also get some help by typing tvi -h:

tvi.py

Generate a random FASTA file

seqgen.py

Weekly tasks

All files you need for completing the tasks can be found at: weekly_tasks.zip

Task 1: run examples (Python tips, numpy, pandas) in this tutorial

Install Anaconda on your PC. Try to understand example code and run in Jupyter or IPython.

Task 2: write a Python program to convert a GTF file to BED12 format

  • Please refer to (https://genome.ucsc.edu/FAQ/FAQformat.html#format1) for BED12 format and refer to (https://www.ensembl.org/info/website/upload/gff.html) for GTF format.

    GTF example:

    BED12 example:

  • The GTF file is weekly_tasks/gencode.v27.long_noncoding_RNAs.gtf.

  • Each line in the output file is a transcript with the 4th columns as transcript ID

  • The version number of the transcript ID should be stripped (e.g. ENST00000473358.1 => ENST00000473358).

  • The output file is sorted first by transcript IDs and then by chromosome in lexicographical order.

  • Column 5, 7, 8, 9 in the BED12 file should be set to 0.

  • Please do NOT use any external tools (e.g. sort, awk, etc.) in your program other than Python.

  • An example output can be found in weekly_tasks/transcripts.bed.

Hint: use dict, list, tuple, str.split, re.match, sorted.

Task 3: write a Python program to add a prefix to all directories

  • Each prefix is a two-digit number starting from 00 and '-'. If the number is less than 10, a single '0' letter should be filled.

  • The files/directories should be numbered according to the lexicographical order.

    For example, if the original directory structure is:

    then you should get the following directory structure after renaming:

  • The original directories can be found in weekly_tasks/original_dirs.

  • The root directory (i.e. original_dirs) should not be renamed.

  • You can use tree command to display the directory structure as shown above.

  • An example result can be found in weekly_tasks/renamed_dirs.

    Hint: use os.listdir, os.rename, str.format, sorted, yield.

Video

@Youtube

@Bilibili

Last updated

Was this helpful?