how to make a scatter plot in Excel

Scatter plots are splendid charts for showing a relationship betwixt two numerical variables across a number of unique observations. We see them in concern communications from time to time, although they're much more commonly used in the "exploration" part of the process—when we're still trying to understand our data and find the important insights.

If you're unfamiliar with scatter plots, their common employ cases, or their benefits and drawbacks in a range of scenarios, check out the what is a scatter plot? article in our SWD Nautical chart Guide. At that place, we explore some of the nuts of besprinkle plots via an example, share tips for designing them more effectively, and discuss common variations (bubble charts, connected scatter plots, and more than).

In this article nosotros'll walk through the steps of creating a besprinkle plot in Microsoft Excel. We'll utilise a modest dataset to:

  • create a simple besprinkle plot with a single information series;

  • modify that graph to show multiple data serial idue north one scatter plot;

  • learn how to add together contextual elements to our view (like averages, quadrant lines, and trendlines);

  • add data labels to all, or simply a few, points in our graph; and

  • create custom labels using other fields in our dataset.

The scenario

Iv years agone, our organization wanted to find a style to make newly-hired junior analysts more successful and effective. We launched a small, competitive pilot plan that would first new employees with a total year of defended and comprehensive training. All other junior hires would continue to receive the on-the-job, course-based and ad-hoc learning feel that we have traditionally provided.

We currently have 20 individuals who have completed the plan. Each month, they and all the analysts in the organisation have their functioning rated as below-average, average, or to a higher place-average in comparing to their peers across our 3 global locations.

The following tabular array shows our pilot program graduates, the number of months they've been with the organization, and how many months their functioning has been rated "above boilerplate."

This data table is the source for our scatterplot.

This information table is the source for our besprinkle plot.

We'll use this dataset to create and refine our scatter plots, reshaping it and adding to it as needed.

How to create a unmarried-series scatter plot

The simplest manner to create a scatter plot in Excel is to highlight the cells in the two columns that contain your ii numeric variables—in this case, the "MONTHS OF EMPLOYMENT" and "MONTHS WITH In a higher place-Average Performance" columns.

XY columns highlighted in our data table.

Highlight the two columns yous want to include in your scatter plot.

And then, go to the "Insert" tab of your Excel menu bar and click on the scatter plot icon in the "Recommended Charts" area of your ribbon.

Select scatter plot from the menu.

Select "Scatter" from the options in the "Recommended Charts" department of your ribbon.

Excel will automatically create a besprinkle plot for you in the aforementioned sheet as your information, using the start column of your dataset as the horizontal (X) axis, and the 2nd column as your vertical (Y) centrality.

A quick note here: in creating scatter plots, a common do is to make the horizontal centrality your "independent variable" and the vertical axis your "dependent variable" (that is, the number that is likely to modify based on the value of our independent variable).

For our scenario, the number of months a person has been employed is more likely to bear on the number of "above average" ratings they receive, rather than vice versa. That'southward why our independent variable—months of employment—is in our information tabular array'south left-hand column, and our dependent variable is in the right hand column.

Excel creates a scatter plot with these default settings.

Excel creates a scatter plot with these default settings.

This automatically-generated graph could use some formatting and cleanup. Taking time to strengthen the skeleton of your graph—everything that isn't the actual data points—volition aid make your insights and information stand out.

Later on removing unnecessary lines, and cleaning up our axes and titles, our graph looks similar this:

Now we have a nicely-formatted, single-serial besprinkle plot that uses an identical black circle as a marker for each of our unique data points. From hither, we can keep to make modifications and refinements to our graph.

How to create a besprinkle plot with multiple series

In the besprinkle plot we've but created, at that place is only 1 information series, consisting of our entire cadre of airplane pilot program participants. Each participant's length of employment is plotted on the horizontal axis, and their total of above-average monthly ratings is on the vertical axis.

Let's assume nosotros wanted to subdivide this data series into multiple serial. For instance, our participants are assigned to three different offices worldwide (U.s.a., Canada, and Nippon); what if nosotros wanted to color our data markers to represent that person'southward location?

In Excel, creating a scatter plot with multiple information serial tin can be done several ways. The easiest is to have a single column in your data containing the 10 values for all of your data serial, and then have a carve up column for the Y values of each private data series.

Let's have a look at how we could alter our existing information table to do this.

Our original data table, which we'll alter in order to make a multi-series scatter plot.

In this tabular array, the "OFFICE" is its own column, and information technology contains three unique values: United states; Japan; and Canada. Instead, add three new columns to the right of the existing table, and make each Role value the name of one of the columns:

Add a column to the right for each of the three different offices.

Add a column to the right for each of the three different offices.

In the cells of those columns, nosotros'll write a formula that says "If the value of [OFFICE] in this specific row matches the header of this column, so give this cell the same value as the [MONTHS WITH In a higher place Average PERFORMANCE] column; otherwise, give it a value of #Northward/A."

In cell G5, add a formula to decide if the cell should be empty, or should contain the value from cell F5.

In jail cell G5, add a formula to decide if the cell should exist empty, or should contain the value from cell F5.

When we propagate this formula across our new columns and down all of our rows, the tabular array will await like this:

Copy the Y values from column F into the appropriate column G, H, or I, based on if the OFFICE value in column A matches the header.

Copy the Y values from column F into the appropriate column G, H, or I, based on if the OFFICE value in cavalcade A matches the header.

Every bit y'all can see, our "United States" column only has numeric values if the "Part" column value in that row is "United States."

When you click on the existing scatter plot, yous'll meet purple and blue highlighting around the 10 and Y columns that Excel is currently displaying in that graph.

Click on the scatter plot to highlight the columns Excel is using for the X (imperial) and Y (bluish) values.

We'd like this graph to show the Y values in the three new columns we've just created. To do that, hold your cursor over the border of the blue rectangle until information technology becomes a paw, and then drag that rectangle right by a unmarried column, then that information technology's highlighting the data underneath "United States."

Click and concord the blueish column, and drag it to the right by a single cavalcade.

You might notice that a lot of your data points are now missing! That'southward because now, Excel is only using the "United States" cavalcade for our Y axis, and Excel won't depict a data signal if at that place'due south an "#N/A" as a Y value.

Not to worry, though: nosotros'll get all our data points back now, by clicking on the lesser right corner of that blueish rectangle and dragging it to the right, so that the rectangle covers all iii new columns nosotros've created.

Click on the bottom correct corner of the blue rectangle, and drag that corner to the right so that all 3 new columns are highlighted.

All of our data has returned, hooray! And, as you lot tin see, Excel is now using a different colour for each of our data serial.

Let's add a legend so that our viewers know what these different colors represent. First, we'll name each of our data series: correct-click on the nautical chart, choose "Select Data," and add the data serial names manually in the pop-up window.

Type in the proper name of each series, or select a cell from the Excel sheet that contains the proper name.

Then, yous can fine-melody the await of your graph—peradventure yous add a legend as a subheader to your title, and pick specific colors for your series—and your multi-series besprinkle plot is set up to go.

This scatter plot contains the same data equally our showtime one, but the data is now subdivided into three series—i for each function—and is colour-coded to reverberate that. We've also added a legend beneath our nautical chart championship to place which color goes with which office.

How to add averages, reference lines, and trendlines to a scatter plot

Whenever we nowadays data to an audition who might be unfamiliar with it, information technology's a good idea to include contextual information to aid make it easier to understand. In a scatter plot, we can  add context like:

  • What was the average Ten value?

  • What was the boilerplate Y value?

  • Was there a goal for either variable?

  • Is there a trend worth emphasizing?

Depending on the insights or the data itself, you might apply i (or several) of these techniques, so let'south become through them one at a time.

How to add an average point to a scatter plot

Nosotros'll start by adding another row to the bottom of our data tabular array, where we'll summate what the average X and Y values were. For simplicity'due south sake, nosotros're only going to look at the average across ALL of our participants, rather than dissever averages for each office (although the aforementioned techniques would apply).

Since columns E and F contain our X and Y values, we'll write formulas to boilerplate the values in each of those columns.

Row 24 will comprise the averages of our Ten values (in cavalcade East) and our Y values (in column F). Use the AVERAGE formula, every bit shown in this screen capture, to generate the correct values.


So, we'll right-click on our chart, choose "Select Data…" from the carte that pops up, and add another data series just for our boilerplate values. Click the "+" button below the "Legend entries (Serial):" window to add a new series, which you can so rename and set the range of X and Y values for in that pop-up.

The "average" series will consist of a single point, showing the average of all X values and the average of all Y values.

That will put a single point on our chart that marks our Ten and Y averages, as y'all can encounter in the updated graph below.

I've deleted the new "average" series from the legend (by clicking once to highlight the legend, so a second fourth dimension to highlight the average, and hitting "delete"), and formatted the unmarried bespeak to be gray with a black outline.

The average of X and Y values across all three data serial is at present shown in a single gray mark, outlined in blackness.

How to add together reference lines and create quadrants in a scatter plot

At present that we have an average bespeak, nosotros can visually break our scatter plot into 4 quadrants:

  1. above boilerplate in both Ten and Y variables;

  2. below boilerplate in both;

  3. high X but low Y; and

  4. low X just high Y.


We'll use our average indicate as the basis for drawing the lines that define those quadrants. Specifically, nosotros'll add a chart element to that signal, and maybe not the one that y'all'd immediately wait: fault bars.

Error bars are typically intended to show boosted statistical context around a data point. Instead, we'll be using them to draw both a vertical and a horizontal line, each of which will connect our axes to the edges of our graph, running directly through our average signal.

Start, nosotros'll need to know how long to make these mistake bars.

  • Nosotros know that each of our axes has a maximum value of 50, and then we'll make sure our error bars cover that full distance.

  • Vertically, we'll describe one bar from the baseline to our average point, and ane bar from our average betoken to the peak of our graph.

  • Horizontally, we'll draw one bar from our Y-axis to our boilerplate bespeak, and then one bar from the average point to the right-hand edge of our graph.

To ascertain the exact length of these fault bar segments, we'll add i more than row of data to our table. Below the Average row, we'll add together a row chosen "Upper spring." The value in those cells is calculated as "50-[boilerplate value]".

Add together an "Upper leap" row to the data set in preparation for building the quadrant lines.

Now, to add our error bars to the graph: click on the "average" data point in the chart, and so go to the Chart Design > Add Nautical chart Chemical element option in the ribbon, and select "Error Bars > More than Error Bars Options..."

Here'south where to find the "More Error Bars Options" item in the driblet-down menus.

This will open up up a Format Error Confined pane on the right side of the screen. (At this betoken, Excel defaults to having you format the vertical error bars; nosotros'll go to the horizontal bars in a minute.) You'll see options for "Management" (select Both), "End Style" (select No Cap), and "Mistake Amount" (select Custom).

This screen shows the correct settings under "Format Error Bars."

By selecting a Custom option under Error Corporeality, you'll have to then click on the "Specify Value" push button. In that popup window, tell Excel how far higher up and beneath your "average" point you desire your vertical error bar to be fatigued. That'south why we created that "upper spring" row of information earlier. For the "Positive Error Value" in the popup box, use the value from the "upper spring" row, and for the "Negative Error Value," use the "AVERAGE" row value.

The vertical mistake bars are sized correctly, but the horizontal ones look like low-res wagon wheels. That's what we'll fix next.

And so, click directly on the horizontal fault bars in the plot and follow the same steps to alter those bars, using the AVERAGE and Upper jump rows for the MONTHS OF EMPLOYMENT cavalcade.

(Note: these bars might exist really pocket-size by default; if you can't click on them, then click anywhere in your graph, then go to the "Format" bill of fare next to "Chart Design", and in the drop downwards menu on the far left of the ribbon, select "Serial 'average' X Fault Bars".)

Quadrant lines are fatigued, just could still use some re-formatting.

Once that's done, you can modify the format of the lines and the boilerplate indicate even so you like. I prefer to push button these reference lines visually toward the background, and to brand the boilerplate point itself invisible by turning off its data marker. (Careful! Don't delete the data marker entirely, because that will also make the fault confined disappear.)

With some formatting changes, our error bars have become perfectly sized quadrant lines, faded nicely into the background.

By cartoon quadrants, I tin can see right away that we have two employees from the Nippon office in the meridian left who have been above average on performance oftentimes, even though they're among the newest 50% of plan participants. Conversely, the xanthous mark in the lesser right shows me that i participant has been here for a long time, but is far below the 50th percentile in terms of to a higher place-boilerplate performance periods. I don't know if this is an interesting story, or the most important insight, but simply drawing the quadrants on the graph makes it easier for me to analyze, and talk about, some of the data points within it.

How to add trendlines to a scatter plot

Unlike cartoon quadrant lines, cartoon trendlines in Excel is adequately straightforward. Perhaps information technology's TOO straightforward, actually; it's clumsily easy to put a trendline in a graph that doesn't take any particular basis in reality, or doesn't describe the bodily trend in a mode that would exist useful for setting time to come expectations. Trendlines can also be attention-grabbing, calculation visual clutter to a graph type that is already challenging for some viewers to accurately interpret. Typically, if there's a trend worth highlighting, it'south already visible in the data without drawing an additional line on the graph itself.

Those concerns aside, a trendline can provide helpful context in sure situations. The graph we're looking at, for instance, may benefit from some visual guidance showing us how well we should expect our pilot program cadre to perform, based on their feel at the company. A linear trendline could use historical data to provide some of that insight.

To create a trendline that uses all of our data points, nosotros'll create a fifth information serial in addition to "United States," "Canada,"  "Japan," and "average." Correct-click on the chart and choose "Select Data…"

In the popup window, add together another data series that uses columns E and F (our original single-series data) as our 10 and Y values. I named it "for creating trendline."

Add some other data serial to the view so that Excel can use it to summate a trendline across all data points.

This data series will be plotted on top of all the existing serial, using Excel's default settings for size, color, and and then on. In the view below, I've made the "for creating timeline" series markers larger and majestic just so you tin come across them more conspicuously.

This purple series will be the ground for our trendline; somewhen we'll brand the markers invisible.

Believe information technology or not, we're almost done. All you lot have to do is right-click on any one of these markers and so select "Add Trendline…" Excel will automatically add a linear trendline to your chart, and show yous the Format Trendline menu on the right paw side of the screen.

The red line in the view is the trendline Excel draws as its default choice.

If you know what you lot're doing with trendlines, here's where y'all'd make your preferred adjustments; for our purposes today, all I'm going to do is mute the colour of the line, and and so plough off the fill and border settings for my "for creating trendline" data markers.

Using colour, size, and line fashion to de-emphasize the default linear trendline.

How to add together data labels to a scatter plot

It'due south great that nosotros've put our data on the graph…merely what does that data really represent? If we care about more than just the overall distribution of information points, we should add data labels to some, if not all, of our points.

To add data labels to a scatter plot, just correct-click on any point in the data series you want to add labels to, then select "Add Information Labels…" Excel volition open up upwards the "Format Data Labels" pane and apply its default settings, which are to prove the current Y value every bit the label. (It will turn on "Prove Leader Lines," which I usually turn off.)

By default, Excel volition add the Y values every bit the data label, and will align it to the right of each information point.

Using but the checkboxes in the "Format Data Labels" pane, you tin can get your labels to include other values or combinations of values, including the Series name, the X value, and/or the Y value. In many cases, though, we'll desire to customize our data labels to show other information than merely those fields. For that, nosotros'll apply the "Value From Cells" option.

How to customize labels in a scatter plot

You tin tell Excel to use any cell or series of cells equally its source for Data Label information. For instance, let's label each of the markers in this nautical chart with the first initial and last name of the pilot program participants. Nosotros can add together a "Data Label" cavalcade to the right of the data tabular array and use the CONCATENATE formula to create a cell in each row with this information.

Y'all can use formulas to fill in your new "Engagement Label" column, or you tin but manually add together the data. In this version, nosotros used the CONCATENATE formula to combine the first initial from the Get-go NAME column, a menstruation, and the full Final Name column value for every row.

Then, click on a data characterization in the existing scatter plot. That brings up the "Format Data Labels" pane, where you can change the settings in the "Characterization Contains" section to use "Value From Cells". Clicking on the "Select Range…" push brings up a popup window request what cells to utilize for the information characterization information, and you can highlight all the cells in that brand-new cavalcade.

Select any specific range of cells to utilise as the source for your custom labels.

When you click OK, every information marker will be labeled..

All marks are now labeled with our custom values.

To reduce the amount of text in the graph, let'southward just keep labels for those participants who were far above or far below the trendline—in other words, overachievers and underachievers.

Double-click on one of the labels you want to remove, and and then either delete it or set its Text Fill to "No Fill." Do the same to several other unwanted labels, until you're left with merely high- and low-performing participants. (Yous can also click-and-drag labels slightly out of their default positions, to make labels with close-by neighbors easier to read.)

Only high and low performers and lableled in this view.


How to put it all together

With the techniques described in a higher place, yous should exist well prepared to create a besprinkle plot in Excel that can exist designed and formatted to support the specific story y'all intend to tell. Employ the power pairing of color and words to assist your audience encounter what y'all want them to notice and acquire from your scatter plot. Feel free to include annotations inside or near the nautical chart area itself, too as big headline or takeaway text above your visualization to emphasize the key letters.

The last version of our visualization includes a takeaway championship, annotations, a reference line, and custom labels on selected elements in our mult-series scatter plot.

Now, you're ready to create your own scatter plots in Excel! The data and the graphs you see in this post (forth with some bonus content) tin can exist downloaded here.

Check out our chart guide for more about besprinkle plots and other graph types; subscribe to our YouTube aqueduct for Excel tutorials and lots of other videos; and follow our blog to get the latest in how-to instructions, graph makeovers, and other tips and tricks for communicating more effectively with data.