Legacy page 2512
IAB Classification of domain or text
Any text classification engine needs a good training dataset. The more accurate the training dataset is, the higher the accuracy of the classification exercise.
Common Screens provides manually created training datasets for IAB classification:
- English language IAB classification training dataset
- Multilingual IAB classification training dataset
The following minimal PHP example uses the teamtnt/tntsearch project. The same approach applies in other machine learning environments.
use TeamTNTTNTSearchClassifierTNTClassifier;
$classifier = new TNTClassifier();
$classifier->learn("A great game", "Sports");
$classifier->learn("The election was over", "Not sports");
$classifier->learn("Very clean match", "Sports");
$classifier->learn("A clean but forgettable game", "Sports");
$guess = $classifier->predict("It was a close election");
var_dump($guess['label']); // returns "Not sports"
In practice, load the training model into a database and iterate through rows feeding examples into $classifier->learn("A clean but forgettable game", "Sports");. Then iterate through the target metadata file and predict each domain category using a combination of title, description, and keywords.
You can train once and reuse the generated model by saving it and loading it again as needed, or by keeping it in memory for faster processing.
Saving the classifier
$classifier->save('sports.cls');
Loading the classifier
$classifier = new TNTClassifier();
$classifier->load('sports.cls');