# Results for: “Drew Conway”

2 eBooks

## 11. Analyzing Social Graphs |
Drew Conway | O'Reilly Media | ePub | ||||

Social networks are everywhere. According to Wikipedia, there are over 200 active social networking websites on the Internet, excluding dating sites. As you can see from Figure11-1, according to Google Trends there has been a steady and constant rise in global interest in social networks since 2005. This is perfectly reasonable: the desire for social interaction is a fundamental part of human nature, and it should come as no surprise that this innate social nature would manifest in our technologies. But the mapping and modeling of social networks is by no means news. In the mathematics community, an example of social network analysis at work is the calculation of a persons Erds number, which measures her distance from the prolific mathematician Paul Erds. Erds was arguably the most prolific mathematician of the 20th century and published over 1,500 papers during his career. Many of these papers were coauthored, and Erds numbers measure a mathematicians distance from the circle of coauthors that Erds enlisted. If a mathematician coauthored with Erds on a paper, then she would have an Erds number of one, i.e., her distance to Erds in the network of 20th-century mathematics is one. If another author collaborated with one of Erds coauthors but not with Erds directly, then that author would have an Erds number of two, and so on. This metric has been used, though rarely seriously, as a crude measure of a persons prominence in mathematics. Erds numbers allow us to quickly summarize the massive network of mathematicians orbiting around Paul Erds. See All Chapters |
|||||||

## 6. Regularization: Text Regression |
Drew Conway | O'Reilly Media | ePub | ||||

While we told you the truth in Chapter5 when we said that linear regression assumes that the relationship between two variables is a straight line, it turns out you can also use linear regression to capture relationships that arent well-described by a straight line. To show you what we mean, imagine that you have the data shown in panel A of Figure6-1. Figure6-1.Modeling nonlinear data: (A) visualizing nonlinear relationships; (B) nonlinear relationships and linear regression; (C) structured residuals; (D) results from a generalized additive model Its obvious from looking at this scatterplot that the relationship between X and Y isnt well-described by a straight line. Indeed, plotting the regression line shows us exactly what will go wrong if we try to use a line to capture the pattern in this data; panel B of Figure6-1 shows the result. We can see that we make systematic errors in our predictions if we
use a straight line: at small and large values of |
|||||||

## 1. Using R |
Drew Conway | O'Reilly Media | ePub | ||||

Machine learning exists at the intersection of traditional
mathematics and statistics with software engineering and computer science.
In this book, we will describe several tools from traditional statistics
that allow you to make sense of that world. Statistics has almost always
been concerned with learning something interpretable from data, while
machine learning has been concerned with turning data into something
practical and usable. This contrast makes it easier to understand the term
In machine learning, the |
|||||||

## Works Cited |
Drew Conway | O'Reilly Media | ePub | ||||

[Adl10] JosephAdler. R in a Nutshell. OReilly Media, 2010. [Abb92] EdwinAAbbot Flatland: A Romance of Many Dimensions. Dover Publications, 1992. [Bis06] ChristopherMBishop Pattern Recognition and Machine Learning. Springer; 1st ed. 2006. Corr.; 2nd printing ed. 2007. [GH06] AndrewGelmanJenniferHill. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, 2006. [HTF09] TrevorHastieRobertTibshiraniJeromeFriedman. The Elements of Statistical Learning. Springer, 2009. [JMR09] OwenJonesRobertMaillardetAndrewRobinson. Introduction to Scientific Programming and Simulation Using R. Chapman and Hall, 2009. [Seg07] TobySegaran. Programming Collective Intelligence: Building Smart Web 2.0 Applications. OReilly Media, 2007. [Spe08] PhilSpector. Data Manipulation with R. Springer, 2008. [Wic09] HadleyWickham. ggplot2: Elegant Graphics for Data Analysis. Springer, 2009. [Wil05] LelandWilkinson. The Grammar of Graphics. Springer, 2005. See All Chapters |
|||||||

## 5. Regression: Predicting Page Views |
Drew Conway | O'Reilly Media | ePub | ||||

In the abstract, regression is a very simple concept: you want to predict one set of numbers given another set of numbers. For example, actuaries might want to predict how long a person will live given their smoking habits, while meteorologists might want to predict the next days temperature given the previous days temperature. In general, well call the numbers youre given inputs and the numbers you want to predict outputs. Youll also sometimes hear people refer to the inputs as predictors or features. What makes regression different from classification is that the outputs are really numbers. In classification problems like those we described in Chapter3, you might use numbers as a dummy code for a categorical distinction so that 0 represents ham and 1 represents spam. But these numbers are just symbols; were not exploiting the numberness of 0 or 1 when we use dummy variables. In regression, the essential fact about the outputs is that they really are numbers: you want to predict things like temperatures, which could be 50 degrees or 71 degrees. Because youre predicting numbers, you want to be able to make strong statements about the relationship between the inputs and the outputs: you might want to say, for example, that when the number of packs of cigarettes a person smokes per day doubles, their predicted life span gets cut in half. See All Chapters |