Bookmark and Share

Introduction to Python for Data Analysis and Automation in Biology


Technical University of Denmark


General course objectives:
Get students to adopt Python in their research.

Learning objectives:
A student who has met the objectives of the course will be able to:
  • To use the Unix shell for working with files and directories, pipes and filters, loops, shell scripts, and searching.
  • Use Python for data analysis and task automation, including the import of libraries, reading and plotting of data, selection and filtering of data, writing of conditional statements and functions, and debugging.
  • Utilize basic version control of data and programming code with Git.
  • Adopt a modern development and reporting environment for Python in the form of Jupyter notebooks.
  • Clean, filter, transform and summarize tabular data with Pandas
  • Visualize data using the Python plotting libraries matplotlib and altair
  • Apply scikit-learn for basic Machine Learning such as classification, regression, clustering, PCA etc.
  • Apply biopython for basic DNA sequence handling
  • Simulate and plan of experiments involving the creation of recombinant DNA using pydna
  • Perform basic image processing using scikit-image

Contents:
With data generation and genetic engineering becoming evermore easy in biology, life scientists and bioengineers are increasingly facing challenges in processing and analyzing data and automating experimental workflows in their line of work. For example, simple tasks (such as designing primers) can become a huge drain on scientists’ time as they repetitively copy and paste information into web interfaces instead of running batch operations. Furthermore, qualifications demanded of biotechnologists in the industry are shifting away from pipetting towards the analysis of data and automation of workflows. Therefore, it is essential that life science and biotechnology PhD students are trained in the computational tools needed for data analysis and task/lab automation. This PhD course aims to get programming novices (little to no experience) off the ground with adopting Python (instead of Excel and Word) in their daily work. In contrast to many existing Python courses targeting computer scientists and software engineers, this course is specifically tailored towards Biotechnology. It focuses primarily on Python as a tool for data analysis and automation, deemphasizing parts that are relevant to software development only. Furthermore, participants are provided with knowledge about data analytics and relevant machine learning methods, including best practice approaches, troubleshooting and avoiding common pitfalls. This course is based on the Software and Data Carpentry curricula (https://carpentries.org) and style of teaching (live coding, hands-on exercise etc.). Since 1998, Software Carpentry has been teaching basic lab skills for research computing to scientists and engineers and course materials have continuously been adapted and tailored to their problems and needs. The course materials for this course have been tailored extensively by us towards life science and biotech related problems that can be solved with Python and specifically target life science and biotech PhD students. The course is 100% interactive and relies on the proven approach of teachers conveying the knowledge through live coding while the participants follow along (supported by teaching assistants). Furthermore, live coding is frequently interrupted by hands-on exercises in which the participants develop programming solutions to appropriate tasks on their own (with the help of the teachers and teaching assistants). This course will provide you with theoretical and practical knowledge about: * Obtain a working knowledge of Python basics and fundamentals relevant to data analysis and automation. * Adopt a modern development and reporting environment for Python in the form of Jupyter notebooks. * Obtain a good overview of key Python libraries covering Bioinformatics/Sequence analysis (Biopython, pydna), data analysis and statistics (Pandas), machine learning (scikit-learn), and image processing (scikit-image).

Back

Course organizer
Kai Kristof , Marjan
Place/Venue
Anker Engelunds Vej 1
City
2800 Kgs. Lyngby
Country
Denmark
Workload
2.5
Link
http://kurser.dtu.dk/course/29905