The Education of a Data or Business Intelligence Analyst, Part I – Introduction to Recommended University Classes

More and more employers are looking for Data Analysts or Business Intelligence Analysts in order to get meaningful information out of the data they collect from their operations. According to Indeed.com, the average salary for a Data Analyst in Dallas, TX is $63,000, while the average salary for a Business Intelligence Analyst in Dallas, TX is $86,000 .

find out what Business Intelligence actually is here

However, it is often not clear what one needs to study to work in either field. Below is a quick list of all of the university classes that you should take to become a Data Analyst or Business Intelligence Analyst. Keep in mind, you do not need to take all of the classes below, but the more you take the easier your career will be.

In future posts, I will go into more detail about each of the general subjects shown below as well as discuss the type of certifications one should pursue.

Calculus
Calculus I
Calculus II
Calculus III – Multivariable Calculus
Differential Equations
Partial Differential Equations
Other Mathematics
Linear Algebra
Discrete Mathematics
Formal Logic
Probability Theory and Statistics
Introduction to Probability Theory using Multivariable Calculus
Introduction to Statistics
Computers and Programming
Introduction to Programming
Data Structures
Object Oriented Programming
Relational Databases
Introduction to Operating Systems
Introduction to Networking
Business
Micro Economics
Corporate Finance
Introduction to Accounting

Maybe SkyNet Will Be Written In Excel VBA After All … OR … Why ThisWorkbook Can Save You Headaches With Your Macros


In my last post
, I joked that SkyNet from The Terminator would not be written in Excel VBA because you cannot make Excel VBA code aware that it exists within a particular workbook.

Turns out that I may have been wrong about SkyNet and that I was very wrong about needing to create a workbook object to ensure your code looks at the workbook in which it exists.

There is a way to force your macros to look at the workbook in which they exist.

You use the ThisWorkbook property.

OK, Give Me An Example

Of course. For the example below, create a workbook called “WhyThisWorkbookIsOKWithMacroCode.xlsm” and place the code below in a VBA module.


Sub NotSoConfusingThisWorkbook()

Dim whatYouThinkIsActiveWorkbook As Workbook
Dim whatActuallyIsActiveWorkbook As Workbook

Set whatYouThinkIsActiveWorkbook = Workbooks("WhyThisWorkbookIsOKWithMacroCode.xlsm")
Set whatActuallyIsActiveWorkbook = Workbooks.Add

whatYouThinkIsActiveWorkbook.Worksheets("Sheet1").Cells(1, 1) = "this is what you think you will see"
whatActuallyIsActiveWorkbook.Worksheets("Sheet1").Cells(1, 1) = "this is what you actually see since activeWorkbook confused you"

whatActuallyIsActiveWorkbook.Activate
Debug.Print ActiveWorkbook.Worksheets("Sheet1").Cells(1, 1)

' show how ThisWorkbook works
Debug.Print "Now Let's try it with ThisWorkbook"
Debug.Print ThisWorkbook.Worksheets("Sheet1").Cells(1,1)

Set whatYouThinkIsActiveWorkbook = Nothing
Set whatActuallyIsActiveWorkbook = Nothing

End Sub

As you can see after running this code, while ActiveWorkbook looks at the last activated or clicked workbook, ThisWorkbook refers back to the workbook in which the VBA Macro code exists.

SkyNet did not become self aware because it was written in Excel VBA … OR … Why you should almost never use ActiveWorkbook and ActiveSheet

NOTE: I wrote this without knowing about “ThisWorkbook”, which solves the workbook but not the worksheet problem. You can read about my discovery here.

“The SkyNet Funding Bill is passed. The system goes on-line August 4th, 1997. Human decisions are removed from strategic defense. SkyNet begins to learn at a geometric rate. It becomes self-aware at 2:14 a.m. Eastern time, August 29th” – The Terminator

“Three billion human lives ended on August 29th, 1997. The survivors of the nuclear fire called the war Judgment Day.” – Sarah Connor

Fortunately, this did not come to pass thanks to the Federal Government insisting that SkyNet be written in VBA since “They like working in Excel”.

While there is endless debate on whether artificial intelligence could become self-aware and a threat to humanity and while many great films (and even more awful ones) have been made about this, I am here to say that you will never have to worry about this happening with code written in Excel VBA.

OK, maybe I am exaggerating a bit, but one very aggravating thing that I have had to deal with Excel VBA is that the VBA code does not understand that it exists within a workbook. You have to explicitly tell it constantly. I will often have a public variable:

Public Const pubConstMacroWBName As String = "TheMacroWorkbook.xlsm"

That I refer too in most of my functions and procedures as:

Set wbMacro = Worksbooks(pubConstMacroWBName )

In order to ensure that the code knows which workbook the code actually exists.

The Difficulties of using ActiveWorkbook and ActiveSheet

When one is just starting out with the VBA Object Model, it is very tempting to use the following two properties

  1. ActiveWorkbook
  2. ActiveSheet

These two properties illustrate Excel VBA’s lack of self-awareness well. Using them tends to come back to haunt people, since these two objects are very unreliable. There value changes depending on whatever workbook and worksheet was last clicked or had the select/activate method used on it, which can cause code to not work as intended.

ActiveWorkbook is not the workbook where the excel VBA code is. It’s the last one clicked or had the activate method used on it with VBA.

ActiveSheet is the sheet you last clicked on or last used the select method on, not where VBA code located in a worksheet module resides. If you used it a worksheet module it can also cause problems.

These two facts provide another reason why “click independent” VBA programming is essential for reliable Excel Macros.

Below I am going to give 2 simple examples

  1. ActiveWorkbook causing confusing behavior
  2. ActiveSheet causing confusing behavior

ActiveWorkbook causing confusing behavior

Create a new workbook and save it as “WhyActiveWorkbookIsBadWithMacroCode.xlsm”.

Type Alt-F11 to get the VBA window.

Click Insert -> Module.

Copy and paste the code below into the module, then click one of the lines of it and press F5.

Sub ConfusingActiveWorkbook()

Dim whatYouThinkIsActiveWorkbook As Workbook
Dim whatActuallyIsActiveWorkbook As Workbook

Set whatYouThinkIsActiveWorkbook = Workbooks("WhyActiveWorkbookIsBadWithMacroCode.xlsm")
Set whatActuallyIsActiveWorkbook = Workbooks.Add

whatYouThinkIsActiveWorkbook.Worksheets("Sheet1").Cells(1, 1) = "this is what you think you will see"
whatActuallyIsActiveWorkbook.Worksheets("Sheet1").Cells(1, 1) = "this is what you actually see since activeWorkbook confused you"

whatActuallyIsActiveWorkbook.Activate
Debug.Print ActiveWorkbook.Worksheets("Sheet1").Cells(1, 1)

Set whatYouThinkIsActiveWorkbook = Nothing
Set whatActuallyIsActiveWorkbook = Nothing

End Sub

If not already open, open the immediate window by typing Ctrl+G. Look at the output in the Immediate window. It is not what you would think it was intuitively.

Why is this a problem?

If your macro is opening and closing multiple workbooks, you can easily find yourself
referencing the wrong workbook.

ActiveSheet causing confusing behavior

Make sure there are two worksheets, “Sheet1” and “Sheet2” in your workbook from above “WhyActiveWorkbookIsBadWithMacroCode.xlsm”.

In VBA project explorer, click on “Sheet1”. Copy and paste the code below into it. then click one of the lines of it and press F5.

Sub ConfusingActiveWorksheet()

Dim wbMacro As Workbook

Dim whatYouThinkIsActiveWorksheet As Worksheet
Dim whatActuallyIsActiveWorksheet As Worksheet

Set wbMacro = Workbooks("WhyActiveWorkbookIsBadWithMacroCode2.xlsm")

Set whatYouThinkIsActiveWorksheet = wbMacro.Worksheets("Sheet1")
Set whatActuallyIsActiveWorksheet = wbMacro.Worksheets("Sheet2")

whatYouThinkIsActiveWorksheet.Cells(1, 1) = "this is what you think you will see"
whatActuallyIsActiveWorksheet.Cells(1, 1) = "this is what you actually see since activeSheet confused you"

' pretend this is you clicking on the worksheet "Sheet2"
whatActuallyIsActiveWorksheet.Select
Debug.Print ActiveSheet.Cells(1, 1)

Set whatYouThinkIsActiveWorksheet = Nothing
Set whatActuallyIsActiveWorksheet = Nothing
Set wbMacro = Nothing

End Sub

If not already open, open the immediate window by typing Ctrl+G. Look at the output in the Immediate window. Again, It is not what you would think it was intuitively.

Why is this a problem?

This can be a problem if you have a worksheet macro that moves between multiple sheets. Though I avoid worksheet macros that are not events specific to that sheet, you may be stuck with a badly put together workbook with code that traps you in this. Unless you reference the explicit name of the sheet, you can quickly find that your macro goes to the wrong place.

OK, are ActiveWorkbook and ActiveSheet useful for anything?

Yes. If you are debugging and you want to find out information quickly, ActiveWorkbook and ActiveSheet are very useful in the immediate window.

If not already open, open the immediate window in the VBA window by clicking View -> Immediate Windows (or type Ctrl-G).

In the immediate window, type “? ActiveWorkbook.Name”. Look at the results.

Type “? ActiveSheet.Name”. Look at the results.

Note that VBA provides a drop down list of other attributes and methods.

Both can come in handy, but be very skeptical of them outside of the immediate window. Unless you are looking up values quickly in the immediate window, I strongly recommend avoiding these two properties.

An Explanation In Plain English Of What Business Intelligence Actually Is, With Two Real World Examples

OK, Explain It To Me

Companies and organizations now feel that they need “Business Intelligence” (BI) to make sense of the data that they gather in order to make better decisions. However, they can be awfully vague on what that means. Before I got hired to work in Business Intelligence, I used to wonder about what on earth they were talking. So what are employers actually doing when they say that they are creating “Business Intelligence”?

The answer is straight forward once you get past the buzzwords:

Business Intelligence means to use SQL queries and scripting to extract information that a human being can easily digest from an organization’s database or databases.

That’s It?

Yes, that’s it. It’s a branch of data analysis.

Fantastic, I am going to set up Business Intelligence for my organization tomorrow!

Now hold on there. Just because one can explain BI easily in one sentence does not mean that it is easy to do. There are at least four major barriers.

  1. The SQL queries can be very complex. They can take days or even weeks to write.
  2. Scripting, using a stats package like SAS or R, or a reporting program like Crystal Reports is often needed in addition to SQL to get the answers you need and to present it in an easily digestible format.
  3. The previous two statements assume that the databases, querying applications, and other software have been set up well in the first place, which often takes a large staff beyond those in an organization’s BI section.
  4. The data needed to do the analysis may not be currently collected. Developers’ of live business databases main priority is usually to ensure that an organization’s information infrastructure runs smoothly from day to day. It is very likely that it never occurred to your organization’s developers to record the data points that you need.

OK, fine, it’s tougher that it looks, but I want to see these real world examples so I can have some idea what I should expect.

No problem.

I can actually give you two good examples of Business Intelligence which I helped create. I used to work for the marketing company LivingSocial as a Business Intelligence Analyst. While most of my work was proprietary, I did BI for two articles for LivingSocial’s Blog.

  1. What are the Nicest Cities in the US? – discusses which of LivingSocial’s American cities have the largest percentage of gift purchases
  2. Who’s Leaving on a Jet Plane? – discusses which of LivingSocial’s American cities have the largest and smallest percentages of travel purchases

I wrote the SQL queries that returned the percentages and tables used in both blog posts. While both are simple examples of BI, they are very typical of the information that organizations wish to get from their data.

facebooktwittergoogle_plusredditpinterestlinkedinmail

Related books picked – and if possible read – by me. Sponsored by Amazon Associates.

What is the Hive SQL COALESCE function, what does it do, and why on earth is it useful?

What is the Hive SQL COALESCE function?

There is a function called COALESCE that exists in Hive SQL. It has the following syntax.

COALESCE(field1, field2, … , fieldn)

What does it do?

Of the fields above (field1, field2, … , fieldn), for each record returned it returns the value of the first field of the ones listed in the COALESCE function that is not NULL. If all of the fields are NULL, it returns NULL.

For example, let’s say one had 3 date fields, datefield1, datefield2, and datefield3 from the table tblDates.

tblDates
primary_key datefield1 datefield2 datefield3
1 NULL NULL 1993-06-04

The code:

SELECT COALESCE(datefield1, datefield2, datefield3) as first_date_found
FROM
tblDates
WHERE
primary_key = 1

will return ‘1993-06-04’

OK, what on earth do I use this for?

You use it to create proxy values for NULLs in a field.

It is very useful when you have dirty data and have to use another field as an approximation of the dirty data’s actual value. If you have a field that is full of NULLs, you can use another field to put values in for those NULLs that you think provide a good approximate value of what should be there.

I want an example!

OK.

Let’s say you are William the Conqueror and have just decided to put together the Domesday Book, where you need to get everyone’s birth date so that you can figure out who and how much you can tax in your new conquered land of England and Wales. Let’s say to tax them that you need to know that they are legal adults over the age of 18 but birth records are not very good in 1066. Let’s also say you magically have an apache server with hadoop on it.

What do you do?

Let’s say your tblDomesDayProxyBirthDays table has 4 fields

table fields data status of the field
thouGreatethsOfKeyeths an 11th century version of a Social Security number
thouDayOfBirth You have some birthdays but not all
thouBaptism You have more baptisms but not all
thouLordRegistrationDate You have all the days they first registered with the local lord of the manor

Let’s say you have these three records in tblDomesDayProxyBirthDays.

tblDomesDayProxyBirthDays
thouGreatethsOfKeyeths thouDayOfBirth thouBaptism thouLordRegistrationDate
1 1048-02-15 1048-03-01 1048-04-01
2 NULL 1049-08-29 1049-10-29
3 NULL NULL 1050-07-01

Well, you need their birthday to figure out if they are adults you can tax. However, you do not always have their “thouDayOfBirth” as shown above.

COALESCE allows you to use other data from other fields as a proxy.

If you use the code:

SELECT
thouGreatethsOfKeyeths
, COALESCE(thouDayOfBirth, thouBaptism, thouLordRegistrationDate) AS proxy_bday
FROM
tblDomesDayBirthDays

You will get:

thouGreatethsOfKeyeths proxy_bday
1 1048-02-15
2 1049-08-29
3 1050-07-01

For the first subject, you have their actual birthday. For the second subject, you have their first day of kindergarten. For the third subject, you have the day they registered with the lord of the manor. You now have a birthday for all of your subjects, even if some of them are not as accurate as you would like.

facebooktwittergoogle_plusredditpinterestlinkedinmail

Related books picked – and if possible read – by me. Sponsored by Amazon Associates.

Tabs are annoying in Macintosh OS X Terminal Command Line Unix since \t does not work. Here is how to put a Tab character in the Command Line.

When in Macintosh OS X Terminal Command Line Unix, you cannot use \t to represent a tab character. What you need to do is:

  1. Type Ctrl-V
  2. then type the actual keyboard button Tab

This will get you the tab character that you need.

Why Idiots At Computers and N00bs At Data Management Can Make Valuable Contributions To The Digital World

I am stuyding the JavaScript Bible by Danny Goodman with Michael Morrison
.
In the first chapter, there is a quote that I really liked. To put it in context, Goodman is talking about his dismay in 1995 that Java was to derived from C and C++ and meant for experienced programmers.

“I would have preferred a language that casual programmers and scripters who were comfortable with authoring tools, such as Apple’s once-formidable HyperCard and Microsoft’s Visual Basic, could adopt quickly. As these accessible development platforms have shown, nonprofessional authors can dream up many creative applications, often for very specific tasks that no professional programmer would have the inclination to work on. Personal needs often drive development in the classroom, office, den, or garage.” (emphasis mine)

I am also reading right now to the authorized Steve Jobs biography, Steve Jobs by Walter Isaacson. He said something quite similar when he was starting with Pixar:

“My view is that people are creative animals and will figure out clever new ways to use tools that the inventor never imagined. (page 241)”

I think that thess two quotes provide a valuable lesson about computers and information technology. Even though you might not have the formal training, that does not mean that you cannot make meaningful contributions to yourself, your employer, your clients, and the programming world at large. I am not sure that I would have come up with my idea about using areas of VBA classes to store and manipulate data if I had been more formally trained.

facebooktwittergoogle_plusredditpinterestlinkedinmail

Related books picked – and if possible read – by me. Sponsored by Amazon Associates.

How To Join Strings In SQL For Hive

Use the CONCAT function. It can be a bit confusing because joining strings, or concatenating (hence the abbreviation CONCAT), is usually done with a symbol like ‘&’ or ‘+’ in other systems and languages.

The syntax is pretty straight forward.


CONCAT(string1, string2, ...)

Let’s say you you had three string fields, address, city, and state from table contact_info
that you wanted to join (concatenate) together with commas and spaces inbetween. You would use the following code:


SELECT CONCAT(address,', ',city,', ',state)
FROM
contact_info

facebooktwittergoogle_plusredditpinterestlinkedinmail

Related books picked – and if possible read – by me. Sponsored by Amazon Associates.

How To Fix New England, New Jersey, and Caribbean Zip Codes in Excel

Assuming the zip codes are in column E, place the following code in column F:


=IF(LEN(E2)<5,REPT("0",5-LEN(E2))&TEXT(E2,"#"),TEXT(E2,"#"))

Drag the formula down the column.

If you want to know why this works, click here.

If you want to know why this problem exists, click here.

facebooktwittergoogle_plusredditpinterestlinkedinmail

Related books picked - and if possible read - by me. Sponsored by Amazon Associates.