Data Science from Scratch -- Chapter 18

Category: learning books Data_Science_from_Scratch
June 2, 2020

Neural networks are something that I’ve played with a little bit before coming to this book. The book moves through pretty quickly, which is both good and bad. It’s good in that it doesn’t really get too bogged down in the details, but it’s bad because it feels a little bit skimpy on the explanations.

Read More

Data Science from Scratch -- Chapter 17

Category: learning books Data_Science_from_Scratch
May 28, 2020

Decision trees are just a set of instructions (basically a flowchart) that take you down different paths depending on your answers. In context, these are used for more complex categorization schemes. (This can also be done as a regression tree to create numerical outputs instead of categorical outputs, but that is not discussed in the book.) The book is more interested in conveying the ideas about decision trees rather than creating an algorithm to optimize one.

Read More

Data Science from Scratch -- Chapter 16

Category: learning books Data_Science_from_Scratch
May 28, 2020

The idea behind logistic regression is similar to linear and multiple regressions in that the goal is to predict a feature of a data point. The difference is that logistic regression is for classification, and specifically determining whether something is in or out of a particular category.

Read More

Data Science from Scratch -- Chapter 14

Category: learning books Data_Science_from_Scratch
May 27, 2020

A simple linear regression is a best fit line that shows the underlying relationship between two variables. For most datasets, this relationship will be imperfect, so the relationship will also include an error term. The idea is to find a relationship between the variables \(x_i\) and \(y_i\) of the form \(y_i = \beta x_i + \alpha + \varepsilon_i\), where \(\alpha\) and \(\beta\) are constants, and \(\varepsilon\) is the error term, which is hopefully relatively small.

Read More

Final Run with COVID Data

Category: learning
May 26, 2020

After playing around for a few days, I’ve decided to bring my efforts to a close on the COVID Statistics by County page. It is now a bit more flexible in what it shows. I don’t think there’s much more to learn regarding this particular visualization method. Some of the code is definitely sloppy and can be cleaned up (especially the naming and the calculation of the running averages), but I’m going to just let that go. It’s not important enough for me to refine this code.

Read More

First Run with COVID Data

Category: learning
May 22, 2020

Late last night, I was able to get the basic code to display the New York Times COVID-19 Dataset by county. The current code does nothing more than that. I would like to get it to display the new cases and new deaths rather than the totals, and I would also like to apply some smoothing over different numbers of days (adjustable). I would also like it to display other information, such as the first infection, total cases, and total deaths. There’s also a part of me that wants to be able to display a map of the state and county, but I’m not sure that I really want to do that.

Read More