书名：R Data Visualization Recipes
作者名：Vitor Bianchi Lanzetta
本章字数：296字
更新时间：2021-07-02 23:33:34

How it works...

First step starts by loading both ggplot2 and the car package. We need both to create a data set for outliers. A new object (out_data) is created to hold only the outliers. A selective call on Salaries is handled by brackets. The function match() is used to pick only the rows containing outliers, those that are returned by boxplot.stats(Salaries$salary)$out.

Brackets [ ] can be added after a vector/data frame/list name to select a range within dimensions (rows, columns, atributes). Data frames have two dimensions ([ <rows> , <columns> ]).

Basically, this particular step filtered Salaries data frame based on the salary variable. Only the rows containing outliers according to boxplot.stats() parameters were selected.

Next step takes care of drawing the box plot. Once we add pseudo-random noise to points, it begins by calling set.seed() so that the result can be perfectly reproduced. In the following lines the basic aesthetics mapping were stored into a ggplot object named box1.

Proximate lines sum it with the box geometry by calling geom_boxplot(). Besides asking for boxes, this last function also demanded those to be notched (notch = T) and erase outliers (outlier.shape = NA). Function geom_jitter() is also stacked in order to plot jittered outliers.

Note how the earlier designated out_data is fits data argument by the last function. A more tailor made control on the outliers is handled by arguments height and width. First one controls how much noise can be added vertically, while the second one draws the analogous limits horizontally.

Data frame and stats manipulations had played a major role in this recipe. This manipulations are often required when there is a desired for more customized results. Now let's check bivariate dot plots.