With 'DAX' And 'MAX', IBM Wants To Help Developers On Their AI Projects

Training computers with Artificial Intelligence (AI) requires a ton of data. And good data doesn't come easy.

IBM’s Center for Open-Source Data and AI Technologies (CODAIT) had introduced two carefully curated databases designed to provide machine learning developers models and datasets for AI projects.

The first one is called 'MAX', or Model Assets Exchange. First launched in 2018, it's an online open-source repository for trainable/deployable AI models. The database has been made for easy understanding, allowing novice developers to use the database without having prior knowledge.

The second one is called 'DAX', or Data Assets Exchange. Announced on July 16th, at OSCON 2019, DAX hosts the data sets that developers can use to train their AI models.

DAX has standardized data set formats and metadata, in contrast with most other open data set resources that tend to incorporate fewer quality and licensing terms checks.

This kind of database is rare, given that it is carefully curated and open source.

As the trends of AI development continue, people will create more and more projects leveraging machine learning technology.

"DAX helps create end-to-end deep learning workflows (from using the data to train models to deploying models in standard ways) allowing developers to consume open data with confidence under clearly defined open data licenses," wrote IBM on its blog post.

"Where possible, datasets posted on DAX will use the Linux Foundation’s Community Data License Agreement (CDLA) open data licensing framework to enable data sharing and collaboration. Furthermore, DAX provides unique access to various IBM and IBM Research datasets. IBM plans to publish new datasets on the Data Asset eXchange regularly. The datasets on DAX will integrate with IBM Cloud and AI services as appropriate."

DAX was created by IBM researchers based on inputs from Watson developers, which matched the characteristics of the target text to those of the real-world documents that the system analyzes in production.

For developers, DAX's data sets are ready for use in enterprise AI applications, with related content such as tutorials to make getting started easier.

IBM - MAX, DAX

According to Fred Reiss, the Chief Architect at CODAIT:

"We wanted to bring the same level of quality to the open source data that you run through this open source software. So we’re following a much more controlled approach with DAX, compared with other repositories of data sets you might find online."

"Every dataset in DAX is shepherded by a member of our team and reviewed by multiple other people within IBM. We start by collecting detailed information about the origins of the dataset and what kinds of problems the dataset would be a good fit for."

"When possible, we reach out to the original creator of the data. We collect detailed metadata about where the data comes from. We familiarize ourselves with the research papers behind the datasets. We even look at the actual data items themselves to check for potential legal and data quality issues."

"Every dataset goes through IBM’s own internal legal review process. Only then does a dataset go 'live' on the site."

The CODAIT team’s goal is to make it straightforward to use of both DAX and MAX assets in conjunction with IBM AI products as well as other hybrid, multicloud AI tooling, both proprietary and open source.

"We want to give data scientists and developers well-curated data starting points, so that it’s easier for them to start developing their AI applications and solutions."

Published: 
27/07/2019