In this paper, classifying and indexing hierarchical video genres using Support Vector Machines (SVMs) are based on only audio features. In fact, segmentation parameters are extracted at block levels, which have a major benefit by capturing local temporal information. The main contri-bution of our study is to present a powerful combination between the two employed audio descriptors; Mel Fre-quency Cepstral Coefficients (MFCC) and signal energy in order to classify a big YouTube dataset that includes multi-Arabic dialects video genres and even sub-genres: several sports analysis and various matches categories (foot-ball, basket-ball, hand-ball and volley-ball), both studio and fields news scenes over and above various multi-singer and multi-instruments music clips. Validation of this approach was carried out on over 18 hours of video span yielding a classification accuracy of 98,5% for ge-nres, 97% for sports sub-genres and 76% for music sub-genres. Finally we discuss SVM kernels performance on our proposed dataset.