Through the course of the term, students will work on a term long project in 3 or 4-person groups.

The objective of each project is to leverage rich and high-quality datasets to answer and address open problems in the health domain. Project tasks can include data mining, modeling, prediction, classification, etc. but most importantly, projects should aim to advance the state-of-the-art in research literature or practice.

To get started, see strategies and resources for finding a research dataset.


  1. In-class presentation

  2. A written report (this can be a publishable paper written to submit to a fitting venue or a report written strictly for this course). In either case, the paper should be written using a target venue's paper template and should follow the appropriate guidelines provided by a relevant journal/conference. See guidelines below.

  3. A project website to document each project and progress. This website will serve as a final portfolio for the work that is done throughout the term. Some examples from previous terms are linked below:

Guidelines for Final Paper

Every project group is expected to write a final paper to share their research results and findings. There will be writing milestones due throughout the term to keep teams on task with writing. Each team should use the provided paper template from the below venue and write their report in accordance with "guidelines for authors":

Alternative venues can be selected based on the team project and preference. Other example venues are:

Important Notes:

  • Manuscript length: ~ 10 - 12 papers not including references. This guideline is for teams writing for ACM HEALTH for which no page limit is given.

  • Organization: Every manuscript must follow instructions provided by the selected venue. An example of submission guidelines for ACM HEALTH can be found here.

  • Template: All papers should use the appropriate template provided by the selected venue. An example of such a template for ACM HEALTH can be found here.

  • Reference Papers: It is always a good idea to have a few examples papers from the selected publication venue that can be used as a reference during the course of writing your own paper. Some example reference papers for ACM HEALTH can be found here.

  • LaTex: All final papers should be written using LaTex. Each project team should use Overleaf - an online, collaborative LaTex editor.

Project Milestones

There will be several milestones to track progress of the project throughout the term:

    • P1 (7%): Exploratory & Initial Analysis (~week 3)

    • P2 (8%): Introduction & Related Work (~week 4 - 5)

    • P3 (13%): Method & Initial Results (~week 6 - 7)

    • P4 (22%): Final Presentation & Final Paper (week 9)

P1 (7%) - Exploratory & Initial Analysis

Exploratory data analysis (EDA) is a critical and often neglected step in data analysis. In this assignment, students should conduct appropriate EDA that is fitting for their dataset with the primary goal of understanding the dataset fully and identifying the types of research questions that are fitting to answer with that dataset. Some guidelines on conducting EDA can be found on the resources page.

Assignment Requirements:

  1. Accept the assignment here and ensure that all members of your team are added to your project repository.

  2. Write clear and clean code (in python using jupyter notebook/google colab) with appropriate comments and section titles to create 10 or more descriptive figures for exploring various dimensions of your research dataset.

  3. Create your project website using a freely available service (e.g., google sites) that includes:

      • Title

      • Group Members

      • Objective (What is the goal of this project?)

      • Innovation (1-paragraph description on why this work is innovative, you must support this with citations/references to related work in literature)

      • Data Description (1-paragraph description of the dataset and its important features)

      • Exploratory Analysis (embed the written code from #1 here, either as a .pdf or directly on the website)

      • References (using an appropriate citation format)

  4. Submit a link to your project website and github repo via canvas.

      • Ensure that your website is publicly accessible through the link submitted, especially if you use google sites.

Need some inspiration?

See examples from previous terms below:

P2 (8%) - Introduction & Related Work

This assignment is the first milestone toward writing your own research paper for your course project. Regardless of whether your team plans for a publishable paper or class report, this paper should be written using the appropriate template for a journal/conference in the space. See the Guidelines for the Final Paper for additional instructions.

Assignment Requirements

Write the Introduction & Related Work sections (~2 pages) of your research paper for your course project.

Guidelines on items to address are below:

  1. Why is the problem space important?

  2. What specific gap exists in the space?

  3. Describe related work in the space (~ 8 or more other papers that attempted to address the identified problem or similar problems in the space?

  4. What is your own research objective(s)?

  5. Provide a brief description (1 - 2 sentences) of how you plan to accomplish the stated objectives.

  6. What are the key contributions of your work?


  • Numbers 1 - 3 above must be supported with references.

  • If writing a research paper is new to everyone in your project group, please reach out to the teaching team for additional guidance. We would be glad to help!

P3 (13%): Method & Initial Results

In this assignment, you will continue with implementing data science methods toward your project objective/goal. Then you will write the methods and results section of your paper (continuing with the template you used in P2).

Assignment Requirements:

  1. Write clear and clean code (in python using jupyter notebook/google colab) with appropriate comments and section titles to implement methods toward your project objective/goal.

  2. Upload the written code and any supporting items in your github repo under a cleared named directory (e.g. P3 - Method & Results).

  3. Write the methods and results section of your project report/paper. See guidelines for writing each section below.

  4. Submit a .pdf of your paper on canvas.

Guidelines to Consider for the Methods section:

  • Start with a data description subsection. Describe key attributes of your dataset that are important for understanding your methods and results.

  • Break down your method into smaller components and describe each component in its own subsection. For example, data cleaning -> feature extraction -> feature selection -> classification.

  • Consider creating a flowchart of your full methodology and approach for analysis, starting from the raw data to the output. Include such a flowchart in your methods section.

  • Make sure to describe (although briefly) each out of the box method used. For example, if you use lasso regression, don’t simply assume the readers know what this is but start by describing lasso regression at a high-level and/or with equations, then cite references where more details can be found.

  • Look at examples for other papers we have read in class or papers from student papers in DS4H last year (see the deliverables above).

  • Make sure you are writing in the format of a publishable paper and not a class project report. The language and style of these two are quite different.

Guidelines to Consider for the Results section:

  • Start with key result(s) or finding(s) from your analysis, then move into the less significant results.

  • Summarize the full results using 1 or 2 tables and/or figures. Ensure these are legible, e.g. legible axis labels, etc.

  • Make sure you have text/paragraphs dedicated to describing what should be seen or the take-away from your tables/figures

  • Look at examples for other papers before writing papers and/or examples from student papers in DS4H last year (see the deliverables above).

P4: Final Presentation (7%)

The final presentations will be on Tuesday (11/8) and Thursday (11/10) during our regular class time. All presentations should be 15mins long with 5mins for Q&A.

The presentation should include the following:

  1. Title, Authors/Presenters

  2. Motivation/Background (why should the audience care?)

  3. Research objective (what is the specific goal of the work? why is it important?)

  4. Data Description/Summary (use text and visuals - this is a good place for some of your exploratory analysis)

  5. Methodology (think flowchart if there are multiple steps, give grounded rationale for the approach taken)

  6. Results (key findings/takeaways)

  7. Limitations/Challenges (including how you would address these in future work)

  8. Top learnings from the project experience

  9. References (on appropriate slides)

Things to consider:

  • The grading scaling is as follows: A+ (100%), A- (94%), B+ (88%), B- (82%), C+ (76%), C- (70%), Less than C-.

  • You are the authors of the paper, the researchers behind the work, the experts on the topic. Make sure to present accordingly.

  • You are not graded based on whether you achieved good/bad results. Instead, you are graded on the soundness of your approach, knowledge of the space, and ability to communicate the work.

  • Use good presentation practice (for example: slides should not be too busy with text and/or visuals, figures should be legible with clear axis labels and legends, etc.)

  • There is a strict time limit. It's a good idea to practice your talk before hand.

  • Have fun!!! If you're not enthused talking about your own work, then chances are the audience is not enthused listening.

  • Presentation order is TBD.

P4: Final Paper (15%)

The final paper is due on Friday (11/11). This should be a fully polished version that includes revisions based on comments received in prior submissions and other improvements that your team has identified.

The final paper should include sections for introduction, related work, methods and results per P3 & P4. In addition, this version should have a newly added discussion section. The discussion section should include implications of your results/findings, comparison with results from related studies, limitations of the work presented, and directions for future research.

As you finalize your paper, be sure to leverage examples such as this one or other examples listed under P3.

Additional Requirements:

  • Update your project website to tell the full story of your project and findings. Be sure to include your final paper, final presentation slides, and a link to your github repo with the final version of your written code (which must be well organized and commented).

  • Submit a link to your project website as a comment on Canvas

  • Submit a pdf of your final paper on Canvas

Note: It is encouraged to ask for input from the teaching team as you finalize your project and paper. While we won't read your full paper before submission, we can read a short sections and/or provide input on how to make the analysis and/or written presentation strong.