How to use Excel for Data Analysis

Data analysis, also known as analysis of data or data analytics, is a method to clean, inspect, transform, model and introspect data with the aim to discover and decipher new information, form suitable conclusions, edit and format data accordingly as well as to validate the decision making process in a suitable and efficient manner. Data analysis has multiple facets, constituting a variety of diverse approaches under a variety of names, in different domains of life, that help to perform a variety of operations and validate the decision making process.

Introduction

This section illustrates the extensively useful and diverse features that Excel has to offer for analyzing data.

  1. Sort: You can sort your Excel data on one column or multiple columns. You can sort in ascending or descending order, depending upon the objective of the operation.
  2. Filter: Filter your Excel data if you only want to display records that meet certain criteria, and can be used to conditionally view specific cells, on the application of a condition.
  3. Conditonal Formatting: Conditional formatting in Excel enables you to highlight cells with a certain color, depending on the cell’s value, depending upon the user’s criteria.
  4. Charts: A simple Excel chart can say more than a big sheet full of information. As you’ll see, creating charts in Excel is very easy and user-friendly, and is also used to make the data more colorful, presentable and interactive with user.
  5. Pivot Tables: Pivot table is one of Excel’s most sublime and interactive features for representing data. A pivot table allows you to extract the significance from a large and detailed data set and view information more concisely.
  6. Tables: Tables allow you to examine and inspect your data in Excel quickly and effortlessly.
  7. What-If Analysis: What-If Analysis in Excel allows the user to try out different values using formulas and can be used in a conditional manner.
  8. Solver: Excel includes a tool called Solver that uses diverse methods from the operations research to find optimal solutions for all kind of decision problems and thus, makes the data more practical and usable for user.
  9. Analysis ToolPak: The Analysis ToolPak is an Excel add-in program, that is essentially the basis of data analysis and examination and provides data analysis tools for financial, statistical and engineering data analysis and is used widely for data examination and inspection.

You don’t have to have a readily available statistical package or functional table available for performing all statistical functions for data analysis. Excel can perform several statistical tests and analyses using the various features provided by it, as mentioned above. First, make sure you have your Data Analysis Tool Pak installed for performing Data Analysis. You should see the option of Data Analysis on the far right of your tool bar. If you don’t see it, go to FILE and then choose OPTION option, and finally click on ADD INS and add the Analysis Tool.

data analysis excel

Learning about your own data

One nice thing about the Data Analysis tool is that it enables the user to perform a variety of operations at once, and makes examination of data very concise and easy. If you want a quick introduction to your data, it will give you a list of descriptives that describe and define your data. That information can be helpful for other types of analysis and thus makes, data inspection easy.

Let’s use a hypothetical file called ENG01.xls file (Examination scores for English in 11th Grade in XYZ School). If we wanted to get a quick overview of the SCORE variable, we can use the DESCRIPTIVE STATISTICS tool available in Data Analysis menu. Go to the DATA menu and click on the DATA ANALYSIS option. From the list of tools available in the menu, choose DESCRIPTIVE STATISTICS:

Highlight the column containing the SCORES for the examination. But some of these variables can be helpful. For example, when you do a regression under Data Analysis, which is discussed later, you want the Mean (average) and Median (middle value) to be fairly close together, in order to keep data concise and not varying to large ranges. You want your Standard Deviation to be less than the Mean. So in the above table, our Mean and Median are close together. The standard deviation is about 29 – which means that about 70 percent of the schools scored between 155 and 213. The below table provides data analysis for the Excel File ENG01.xls.

Correlation

Another good overview of your data is through using a correlation Matrix, which acquaints the user with the information of what variables tend to go up and down together and in what direction. It thus provides an easy way of representing the relationship of data, before moving onto regression of data. The correlation is measured by a variable called Pearson’s R, which ranges between -1 (indirect relationship) and 1 (perfect relationship).

data analysis excel

Go to the DATA table and the DATA ANALYSIS option and choose CORRELATIONS option. Choose the range of all the columns, barring header, which you want to compare:

You get a table that compares each variable to all other variables. Because the relationship between two columns is the same, containing the similar nature of value, no matter which direction you compare. Excel gives you the value one value for the comparison. Below you see that the correlation between Column 3 and Column 1 is -0.43348. Correlation provides a general indicator of any relationship between two variables, depending upon the value stored in the cells, but it doesn’t allow you to let you predict one variable based on another, as it is not able to provide any computation on the basis of the values stored. To do that, you need an operation of data analysis called linear regression, also known as ordinary least squares regression.

The relationship between cells helps in generating new information, used to gain data for analysis and inspection. For example, people growing up in a lower-income family would probably score lower on standardized tests than those coming from well-established homes. Regression helps us view the relationship and even say about how much characteristics affect another, and can be used to develop new connections amongst the provided data. To start, you need to determine which characteristics stored in cells constituting independent variables, amongst which no independent relationship exists. These are the predictors. Next, the characteristics they help predict are the dependent variables, constituting existing relationship amongst data. When you develop and test a regression, you get a result known as an R-square. That will help you see how much the independent variable predicts the dependent variable.

Let’s Regress

Go to the DATA tab and click on the DATA ANALYSIS menu, and then click on Regression in the dialog box opened:

On choosing Regression from the list, Excel will, next ask you to highlight the range of cells for the X and Y ranges, on which data analytic tool would be applied. The Y is hypothetically DEPENDENT variable and X is any INDEPENDENT variable. Check Confidence Level and leave at 95 percent (usually taken) – the common level used for social science research and investigation. Check New Worksheet Ply so the output goes into a new worksheet.

The output produced is highly complex and large. But for now, let’s pay attention to a couple of variables, namely, Adjusted R Square and lets study the significance:

R Square is a characteristic feature developed by Regressive Statistical Data Analytic Tool that tells the user how much of the change in your DEPENDENT variable can be explained by an INDEPENDENT variable. In this case, only about 16 percent of the change in SCORE can be explained. An R Square value always lies between 0 and 1 – the closer to 1, the stronger the relationship. The SIGNIFICANCE variable is 0.000. If this is less than 0.05, it means your results are significant and reported at longer precision – or that they didn’t just occur by chance.

Using this information, Excel can tell us whether a school is doing better or worse than it should given the percent of poor students, and thus can be used for detailed data analysis. Under RESIDUALS, check RESIDUALS, STANDARDIZED RESIDUALS and then, LINE FIT PLOTS. The residuals tell you how well a school did:

The first school had a score of 210.5, which is 6.77 points better than it should given its poverty level. The STANDARDIZED RESIDUAL tells you the same information in terms of standard deviation. The plot gives you a graphic image of your model, which can be used to draw a variety of inferences.

Comments are closed.


Magoosh blog comment policy: To create the best experience for our readers, we will only approve comments that are relevant to the article, general enough to be helpful to other students, concise, and well-written! 😄 Due to the high volume of comments across all of our blogs, we cannot promise that all comments will receive responses from our instructors.

We highly encourage students to help each other out and respond to other students' comments if you can!

If you are a Premium Magoosh student and would like more personalized service from our instructors, you can use the Help tab on the Magoosh dashboard. Thanks!