I completed the 10-course data science specialization by Johns Hopkins University on Coursera.
Here are my certificates:
https://www.coursera.org/account/accomplishments/specialization/JKSAW82GLH35
Links
- Shiny Demo Application : https://technicalelvis.shinyapps.io/shiny_demo_word_predictor/
- Demo Slides : https://telvis07.github.io/slides_demo_word_predictor/#1
- Milestone Report : http://rpubs.com/telvis/capstone_report_1
- Source : https://github.com/telvis07/10_capstone
- Link to raw data : https://d396qusza40orc.cloudfront.net/dsscapstone/dataset/Coursera-SwiftKey.zip
- Link to ngram models with MLE probabilities : https://github.com/telvis07/shiny_demo_word_predictor/tree/master/models
Retrospective
I enjoyed the course. This course took me waaaay more time than I thought because I struggled with a few issues.
- First, I wish I'd started by taking the NLP online course before starting the Capstone (https://www.youtube.com/watch?v=-aMYz1tMfPg).
- There was an issue installing RWeka, RJava and it took me several days to work through the issues. I eventually moved to using quanteda (https://cran.r-project.org/web/packages/quanteda/vignettes/quickstart.html).
- I also waited far too long to develop a method to test my model using a subset of the training data, so I could test whether changes to my model improved and reduced performance. It turns out that my model trained on a 25% sample performed just as well as a model trained on 100%. I should have spent more time trying different models with the 25% sampled data.
I'm thankful for the Discussion Forum and final peer review process. Both helped me learn how I can improve my model and demo application. I really appreciate the instructors for creating this specialization. I've learned a lot.