
CISC 7412X
Main
Files
Syllabus
Links
Homeworks
Notes
0001
PDFs,etc
AI Intro
Data
Oracle Primer
MySQL Primer
PostgreSQL Primer
Hadoop/Hive
HBase Primer
Spark Primer
Trino Primer
|
 |
 |
Homeworks
HW1: Download some big collection of text. I highly recommend Project Gutenberg (google for it; you can download the entire DVD). Part1: Write a parser/reader that will collect probabilities of every word following any other word. Convert everything to lowercase. Punctuation breaks the sequence (e.g. end of setence, etc.). Part2: Using the probabilities from part1, generate text... start with a random word, then randomly (weighted by probabilities) pick the following word, and continue. Email code for the project and a sample of the generated output. Other things to experiment with: gather probabilities of any word following a pair or triplet of words, etc. (originally assigned 20140910)
|
 |