What is Classification of Data?
Classification of data refers to the process of organizing the data in hand into identical groups, categories, sub-groups and sub-categories, as per their common properties or resemblance. It takes place after the editing of data.
In simple words, when raw data is arranged into various classes it is termed classification. For this purpose, the classes are ascertained on the basis of nature, objectives, and scope of enquiry.
Characteristics of Classification
Classification of data is characterized by:
- In classification, data is arranged into homogeneous groups.
- Data is classified on the basis of similarity in their characteristics or inherent features.
- It indicates unity in diversity
- It may be actual or notional.
- The data is classified as per certain measurable or non-measurable characteristics.
Objectives of Classification
- To condense the raw data in a precise and orderly form for statistical analysis.
- To facilitate comparison between data and draw inferences from data.
- To have a cursory look at the significant features of data.
- To highlight relevant items and give less weightage to irrelevant items
- To use data for further statistical analysis.
- To reveal patterns of variation and outline the characteristics of any variables presented in data.
- To provide information regarding the relationships between various elements of the data set.
Requisites for Ideal Classification
The requisites of classification of data are basically the rules for data classification, these are:
It should be unambiguous: Classes should be defined rigidly, so as to avoid any ambiguity. Hence, there should not be any room for doubt or confusion, with respect to the arrangement of the observations in the given classes.
It should be exhaustive: Each and every item of data must belong to a particular class. An ideal classification is one that is free from any residual classes such as others or miscellaneous, as they do not state the characteristics clearly and completely.
It should be mutually exclusive: The classes should be mutually exclusive i.e. non-overlapping and so the items belong to one and only one class.
It should be homogeneous: Classification is regarded as homogeneous when similar items are arranged in a particular class.
It should be stable: An ideal classification should be stable, in essence, that the same pattern should be used during the process of analysis, as well as for any enquiries in future on the same subject. This means that the classification of data set into different classes must be performed in a way, that whenever an investigation is carried out, there is no change in classes and so the results of the investigation can be compared easily.
It should be suitable for the purpose: The classification must be performed, keeping in mind the very purpose of the enquiry.
It should be flexible: Classification should be adjustable in nature which can be easily adjusted according to the new situation and condition. That is to say, without making major changes in the classes, the data is classified into major classes. Changes in the subclasses are allowed to a certain extent so as to retain the characteristic of stability while having flexibility.
Basis of Classification of Data
In this classification, the data is classified on the basis of geographical i.e. physical features of an area or locational or regional differences like village, city, etc. between different items of data set. Another name for such classification is spatial classification. These are commonly listed in alphabetical order.
Classification of data based on time of occurrence, starting from the earliest period to the latest period, is called chronological classification. Some examples of chronological classification are national income, annual output of rice, monthly expenditure of a household, etc. It is also called temporal classification.
Classification of data on the basis of certain conditions is termed conditional classification. The items which met those conditions are listed in the particular class.
When the data is classified on the basis of descriptive characteristics or specific attributes like literacy, region, education, marital status, colour, etc. it is called qualitative classification. It includes non-measurable data. It can be performed in two ways:
- Simple Classification: Each class is subdivided into two subclasses and just one attribute is taken for the purpose of the study. This means that the data is classified only into two categories, such as male and female, single or married, etc. It is also called dichotomous or twofold classification.
- Manifold Classification: In this, each class is subdivided into more than two subclasses and then they are further subdivided.
Based on attribute it is classified into:
- One way classification: When the classification is based on a single attribute it is called one-way classification.
- Multiway classification: When the classification is based on two or more attributes, it is called multiway classification.
In a quantitative classification, the classification depends on those attributes which are measurable in nature like height, weight, expenditure, production, sales, age, etc.
Classification is performed for making data easy and condensed as well as to arrange them in a logical manner.