This section is focused in showing you how kazAnova operates, what is its outlook and generally what are the main elements that it is consisted of as well as important theoretical underpinnings you should be aware of.I hope this section is not too tiring, but it is a necessary evil.
Generally KazAnova can import only 3 different types of files at the momment.
The Headings must always be included in the first row no matter what method you are using to import your file. All columns need to have a heading.
Now you should be able to see a couple of variables (columns) starting from “ID” and ending to “Target”. Some of the variables seem to be numeric like ”AGE” and some others appear to be Strings (e.g. to have alphabetic characters) like “REGION”. KazAnona classifies these variables into categories based on a pre-specified logic when importing the file.
Before explaining about the menu Bar, go to the menu “Pre-process” and then select “Manipulate”. You should be able to view a similar window like the one below:
Take a note of the symbols in the Variables’ list panel. If you scroll down you will see that all your columns (Variables) have a different symbol-icon linked to them. What are these mystical symbols?
So:
You should know that kazAnova reads the first 2,000 lines and tries to predict what the variable classification is. In cases of bigger sets, it is possible that the attributed classification is wrong (e.g. the first 2000 lines contain numeric characters, but the 2001 line is wrongly alphabetic, the classification will be numeric). If that happens, do not panic, there are ways to fix it easily but are always indirect. That means you cannot manually change the classification of this variable, but you could create a new one that has the same values (replacing the incorrect ones) but different classification.
Another thing you need to know is that the software understands dot (“.”) as decimal value, NOT Comma (“,”) and this cannot change. However in this tutorial's screenshots you will see it the other way around, but that is only for me! I feel the need to clarify this as where I come from is the other way around. Therefore you need to make certain that all your numeric with decimals variables have dotted decimals (you could easily open the file in notepad and replace comma with dot). Also the software does not understand a thousands’ separator (normally with comma) subsequently you need to avoid that.
Back to the Menu bar you can see the following Tabs:
Starting with the First Tab:
If some of the letters in the above frame seem all Greek to you, it is because they are Greek!This process will create a .txt file that is tab delimited. There is no special file for kazAnova, every time you save your work the format will be a tab delimited .txt file that can easily be opened in the same way you imported the current one. Note that every time you save you need to select a name again (you can put the same if you want). This is because there is no “go back” or “go forward” functions in kazAnova and specific actions (e.g. if you sort the file) cannot bring the file back in to its normal state unless you have a previous saved version of it.
A few words about the other tabs:
In my experience I spend over 80% of the modeling time in the pre-process tab and in putting the results together in a nice presentation rather than running the actual modeling algorithm (like regression). In the Pre-process tab generally you can use the following functionalities. You do not need to understand anything right now, I am only displaying it so that it exists somewhere for future reference when you will feel more familiar with the software and you would like to explore more of its functionalities.
The Graph Tab has 4 elements:
The models’ section contains many different statistical algorithms used in prediction and classification problems in a very generic form-a bit unfriendly if you prefer.
The Scorecard Tab (where we play a lot) is more focused in visualizing the results of statistical algorithms and making them easier to interpret and report. It contains scorecards and decision trees of binary or even continuous outcomes.
The Report tab shows the cross tabulation of all the chosen variables versus some binary one . It also displays various statistics regarding the predictiveness (relationship with the variable target) of those variables.
The help function is not actually implemented yet, thus you better pay attention to this tutorial!
That is the end of Tutorial 1. Tutorial 2 will be much more interesting because we will start taking data-related actions.