Challenge Criteria
The Semantic Web Challenge 2014 is defined in terms of minimum requirements and additional desirable features that submissions should exhibit.
The minimum requirements and the additional desirable features are listed below.
The challenge was originally intended to showcase developments in, and the potential of Semantic Web Technologies. These technologies have now reached a certain level of maturity, so we expect that in addition to showcasing the technologies, entries for the challenge should demonstrate one or all of:
- a clear commercial potential;
- a large existing user base; or
- functionality that is useful and of societal value.
This applies to submissions for either track.
Open Track
Minimal requirements
- The application has to be an end-user application, i.e. an application that provides a practical value to general Web users or, if this is not the case, at least to domain experts. It should show-case functionalities that the use of semantic web technologies can bring to an application.
- The information sources used
- should be under diverse ownership or control;
- should be heterogeneous (syntactically, structurally, and semantically); and
- should contain substantial quantities of real world data (i.e. not toy examples).
- The meaning of data has to play a central role.
- Meaning must be represented using Semantic Web technologies;
- Data must be manipulated/processed in interesting ways to derive useful information; and
- this semantic information processing has to play a central role in achieving things that alternative technologies cannot do as well, or at all;
Additional Desirable Features
In addition to the above minimum requirements, we note other desirable features that will be used as criteria to evaluate submissions.
- The application provides an attractive and functional Web interface (for human users)
- The application should be scalable (in terms of the amount of data used and in terms of distributed components working together). Ideally, the application should use all data that is currently published on the Semantic Web.
- Rigorous evaluations have taken place that demonstrate the benefits of semantic technologies, or validate the results obtained.
- Novelty, in applying semantic technology to a domain or task that have not been considered before
- Functionality is different from or goes beyond pure information retrieval
- Contextual information is used for ratings or rankings
- Multimedia documents are used in some way
- There is a use of dynamic data (e.g. workflows), perhaps in combination with static information
- The results should be as accurate as possible (e.g. use a ranking of results according to context)
- There is support for multiple languages and accessibility on a range of devices
Big Data Track
The original goal of the Billion Triples Track was to demonstrate the capability of semantic technologies to process very large and messy data as typically found on the public Web.
To demonstrate this and address real scalability issues the previous BTC minimal requirements required that the competing applications must make use of a specific Billion Triple Challenge Dataset provided by the SWC organizers.
The practical and research problems dealing with very large data sets are now ubiquitous, and we have opened up this competition to more researchers and allow them to showcase real progress in dealing with their own big data. To reflect this, the track was renamed the Big Data Track.
For those who do not have their own large datasets, however, a 2014 Billion Triples Challenge dataset will be provided which can be used by submissions.
The primary goal of the Big Data Track is to demonstrate approaches that can work on Web scale using realistic Web-quality data.
The specific goal of the Big Data Track is to demonstrate the scalability of applications.
We stress that the goal is not to be a benchmarking effort between triple stores, but rather to demonstrate applications that can work on Web scale using realistic Web-quality data.
Minimal requirements
- Data Volume: The applications must make use of a very large dataset. It can be your own, from a company, from the web, but its size needs to be clearly indicated. Bigger is better.
- Data Variety: The data used by the applications must be varied, structured and unstructured, preferably from diverse sources and in different formats. Integrating these diverse structured and unstructured data types quickly into useful information might require more video, audio and XML usage within your systems. How well are your data management practices at integrating and designing with these new data types? The variation and sources of data need to be clearly identified. Messier is better.
- Data Velocity. Data streams vary quickly so big data streams need to be understood, prioritized and integrated into your application quickly .The time for data design, performance tuning and especially maintenance needs to be compressed. What are your strategies in dealing with this Velocity? How automated and re-usable are your (data management) processes and practices?
The tool or application does not have to be an end-user application as defined for the Open Track Challenge, but usability and value are of concern. The key goal of the BDT is to demonstrate a meaningful interaction with big data sets driven by a user or an application.
The functionality of the applications is left open: for example it could involve helping people figure out what is in the data set via browsing, visualization, profiling, etc., or inferencing that adds information not directly queryable in the original data set.
Additional Desirable Features
In addition to the above minimum requirements, we note other desirable features that will be used as criteria to evaluate submissions.
- The application should do more than simply store/retrieve large numbers of triples.
- The application or tool(s) should be scalable (in terms of the amount of data used and in terms of distributed components working together)
- The application should either function in real-time or, if pre-computation is needed, have a real-time realization (but we will take a wide view of "real time" depending on the scale of what is done)
|