Preloader image


Digital age calls for Arabic management

Content analysis in Arabic is crucial for a large number of organisations. The dramatic increase in Arabic data requires tools which are adapted to unprecedented quantities, speed and variety of data.
Thanks to inequaled linguistic resources and processing plateform, Arabic Box manages Arabic as well as English or French, enabling users to add to their workflow advanced features for Arabic content processing.

Modular system

Arabic Box is an appliance which modules can be gathered to create specific
solutions to manage Arabic content.
• Linguistic analysis and data enrichment
• Entities extraction/Part of Speech Tagging
• OCR Arabic improvement
• Semantic binding

Arabic Box is scalable: components can be activated on the fly to meet new needs.

Turn your system into
Arabic expert

Arabic Box set of components offers many possibilities of application. It is simply set up according to your workflow and your specific processing needs.
Arabic Box can be plugged to any information system thanks to API and a set of connectors to the main ECM systems.


Linguistic analysis and enrichment

Linguistic analysis and processing is the keystone of any information system to manage Arabic. It is made through a set of processing modules (including segmentation, lemmatization, morphological and morphosyntactical analysis, coding and vowellation) and unique linguistic resources (the largest Arabic lexicon).

Entities extraction/POS Tagging module

Complex and named entities are the most valuable units in a text. They concern names: Persons, organizations, events, brands, products or locations
POS Tagging is a component which analyses and tag content. It gives fine description of linguistic units referring to specific domain lexicons : banking, politics, geographics, etc.
Tags issued from its analysis enrich document enabling smart search or giving valuable material to BI systems

Arabic OCR

Many companies or organizations have to manage archives which only exist in image format into their ECM softwares. Unfortunately, this content is inappropriate to quick search and retrieval.
Arabic OCR is a component that comprises a high end OCR software and dedicated Arabic libraries.

Semantic building

Semantic building is about making smart links between your documents: This network is made thanks to shared named entities (names, locations, events) and metadata. With such a network, your search engine is able to offer results beyond the keyword limit: you do not only search for words but for topics.


Arabic Box, compact and connectable

The Arabic Box has been designed to be connected with your current system. Search engines, DMS, CRM softwares, Imaging storage systems, the Arabic box features all main ECM systems connectors.

Plug and use

The Arabic Box can be plugged to your current system without changing the core system of your IT infrastructure. Thanks to API or webservice, your system benefits easily from the Arabic Box power.
Connectors to main ECM systems are available.

System compatibility

ECM Imaging storage dynamization

Captiva - Documentum
Newgen Software

Search engines enhancement

HP autonomy

Arabic data analytics enabler


CRM softwares enhancement

Microsoft Dynamics


One size does not always fit all

Question is not only about light customization, but to answer to the real needs you may have. To face true Arabic content management challenges, the combination of Arabic box components offers many possibilities to have the features your workflow really needs. Our marketing and development teams are ready to study your made to measure solution to give the right answer to your challenges.

Here are some examples of applications Arabic Box offers.

GSA Arabic enhancement

Your system includes Google Search Appliance.

‣ You want to make your GSA Arabic full compatible.

Arabic Linguistic Processing + Tagging module enables:
‣ to process the Arabic content according to linguistic rules
‣ to enrich documents (tagging)
‣ to enrich GSA index
‣ to benefit from synonymous and crosslingual features

ECM Imaging storage dynamization

Your system digitizes and indexes documents through imaging systems.

‣ You want to have full text access into the content of your image archive

OCR Extraction & Search module enables:
‣ to transform image documents into text format
‣ to process Arabic linguistic processing
‣ to enrich documents (tagging)
‣ to enrich your system index
‣ to retrieve information in full text mode

ECM Search engines enhancement

Your ECM system has poor results for searches in Arabic.

‣ You want to have relevant results when you search for Arabic content

Arabic Linguistic Processing + Tagging module enables:
‣ to process the Arabic content according to linguistic rules
‣ to enrich documents (tagging)
‣ to bind content according topics (semantic binding)

Arabic data analytics enabler

Your system processes large flux of various data including Arabic content.

‣ You need to mine and to get visualization tools to analyse Arabic data

Arabic Data Processing module enables:
‣ to process Part of Speech to mine Named Entities
‣ to display Named Entities in a clear visualization system
‣ to search for content related to Named Entities

NPS Arabic

You like to survey your company or brand customer loyalty.

‣ Your customers are Arabic speakers and you like to analyse their verbatims

Arabic NPS features:
‣ NPS software (customizable dashboards and campaigns)
‣ Arabic linguistic processing and tagging (Verbatim)
‣ Verbatim instant tagging
‣ Semantic search engine


Data exploiting

A full set of widgets to serve systems interface and help user to
better access to the information.

‣ These widgets enable to visualize your data through different displays to get a new point of view on results.

Time line
Interface design elements
Back Office dashboard and reports