In this chapter we’ll explore two topics that have started to become especially hot over the past 5 to 10 years: social networks and data journalism. Social networks (not necessarily just online ones) have been studied by sociology departments for decades, as has their counterpart in computer science, math, and statistics departments: graph theory. However, with the emergence of online social networks such as Facebook, LinkedIn, Twitter, and Google+, we now have a new rich source of data, which opens many research problems both from a social science and quantitative/technical point of view.

We’ll hear first about how one company, Morningside Analytics, visualizes and finds meaning in social network data, as well as some of the underlying theory of social networks. From there, we look at constructing stories that can be told from social network data, which is a form of data journalism. Thinking of the data scientist profiles—and in this case, gene expression is an appropriate analogy—the mix of math, stats, communication, visualization, and programming required to do either data science or data journalism is slightly different, but the fundamental skills are the same. At the heart of both is the ability to ask good questions, to answer them with data, and to communicate one’s findings. To that end, we’ll hear briefly about data journalism from the perspective of Jon Bruner, an editor at O’Reilly.

Sometimes, order is important.

Counting all the possible ways in which you can order things is time consuming, but the trouble is, this sort of information is crucial for calculating some probabilities. In this chapter, well show you a quick way of deriving this sort of information without you having to figure out what all of the possible outcomes are. Come with us and well show you how to count the possibilities.

One of the biggest sporting events in Statsville is the Statsville Derby. Horses and jockeys travel from far and wide to see which horse can complete the track in the shortest time, and you can place bets on the outcome of each race. Theres a lot of money to be made if you can predict the top three finishers in each race.

The opening set of races is for rookies, horses that have never competed in a race before. This time, no statistics are available for previous races to help you anticipate how well each horse will do. This means you have to assume that each horse has an equal chance of winning, and it all comes down to simple probability.

Many statistical techniques used in education and psychology are common to other fields of endeavor: these include the t-test (covered in Chapter 8), various regression and ANOVA models (covered in Chapters 1215) and the chi-square test (covered in Chapter 10). The discussion of measurement in Chapter 1 will also prove useful since much of educational and psychological research involves constructs that cannot be observed directly and have no obvious units of measurement. Examples of such constructs include mechanical aptitude, self-efficacy, and resistance to change. This chapter concentrates on statistical procedures used in the field of psychometrics, which is concerned with the creation, validation, and use of tests and measurements applied to human intelligence, knowledge, abilities, and psychological characteristics such as personality traits.

The first question you may ask with regard to the use of statistics in education and psychology is why they are necessary at all. After all, isnt every person an individual, and isnt the point of education and psychology to perceive that person in all their individual richness, not to reduce them to a set of numbers or place them in comparison with others who may not really be comparable at all?

There are many statistical resources available on the Internet, and no published list could possibly be complete, nor would it want to be; too much information can be as bad as too little. As is true of the Internet in general, not every resource online is accurate or reliable, so its up to the user to decide whether a particular resource is appropriate to his use. The web pages listed here are all maintained by reputable sources, including the federal government, university departments of statistics, professional statisticians, and companies that produce widely used statistical products.

The Statistics Online Computational Resource

Many resources, including interactive tools and course materials, from the UCLA Statistics Online Computational Resource.

Rice Virtual Lab in Statistics

A collection of resources, including an online textbook, simulations and demonstrations, cases studies, and statistical analysis tools.

Web Pages that Perform Statistical Calculations

Links to many tools, including statistical decision trees, free statistical software, online calculators, and graphing programs, maintained by John C. Pezzullo, a retired professor of biostatistics and pharmacology.

