Key Features:
- Available as a Python command line, a Python API, and an experimental TFJS version (which powers our web demo).
- Trained on a dataset of over 25M files across more than 100 content types.
- On our evaluation, Magika achieves 99%+ average precision and recall, outperforming existing approaches.
- Supports more than 100 content types.
- Batching: You can pass to the command line and API multiple files at the same time, and Magika will use batching to speed up the inference time.
- Near-constant inference time independently from the file size; Magika only uses a limited subset of the file's bytes.
- Supports three different prediction modes to tweak the tolerance to errors: high-confidence, medium-confidence, and best-guess.
- Open source with more enhancements in the pipeline.