We read for three purposes: (1) Read for entertainment, such as reading a fiction and a novel. (2) Read for information, such as reading twits and headline news. (3) Read for understanding, such as reading an essay or an academic paper. The purposes of these readings may often overlap, and reading for understanding is the most sophisticated form, which plays a major role for humans to acquire knowledge presented in printed materials.
We believe that the best approach to reading for understanding is to read the central ideas first by reading the most critical part of a document, and then gradually expanding coverage in descending order of importance into the entire document. However, reading documents following the order that sentences are presented has been the dominant way of reading for thousands of years, because it is impossible to know which part is important until one has read the whole document.
We fill this gap by inventing and developing Dooyeed, a software tool that uses natural language processing, intelligent text management and text mining algorithms, and optimization techniques to accurately and automatically identify and highlight blocks of a document in descending order of importance to facilitate reading for understanding, where a block consists of sentences that may or may not be consecutive in the original document. In so doing, Dooyeed allows users to concentrate on reading the most important block of contents, then move on to the next important block with the previous blocks of sentences in the original order of the document, and continue reading in this fashion until the entire document or a certain layer of blocks is read.
The core idea of this technology can be described as follows: On a given text document, it first ranks each sentence based on syntactic and semantic relations between words, the topics contained in the document, and the structure of the document. It then extracts sentences according to distributions of salient scores and topics to form blocks of sentences, and maximizes the content coverage and diversity of each block.
Users may determine how large a block or a number of sentences at each layer would best match their reading capability. For example, the first-layer block may consist of 10% of the sentences of the entire document, the second-layer 30%, the third-layer 60%, and the last layer the entire document. Dooyeed also provides default options on the number of layers and the block size at each layer.
Each new block of sentences is displayed with a clear visual effect from previous blocks. We may imagine that each block is a subsurface lifted in the third dimension with a certain height, such that the most critical block is at the highest layer, the least important block is at the lowest, and the rest of the blocks are at different layers according to their importance levels. To implement this idea on a 2D computer screen, we use different colors to represent different layers, similar to contour lines in a topographic map.
In addition to stripping a document down to different layers of importance, Dooyeed can also help test if the reader has gained a certain level of comprehension of the document by asking the reader a set of questions that are generated and graded automatically based on the contents of the document.
人们阅读的目的有以下3个层面：1、娱乐性阅读（reading for entertainment），这是为娱乐进行的阅读，比如阅读小说、诗歌等。2、信息性阅读（reading for information），这是为获取信息进行的阅读，比如阅读报纸、杂志等。3、理解性阅读（reading for enlightenment, reading for understanding） ，这是为获取新知识和新感悟进行的阅读，比如阅读教科书、学术论文等。
RUR 理解性阅读器（Reading-for-Understanding Reader）正是为了革新传统的阅读方式而发明的智能辅助阅读工具，旨在给读者提供一个高效的理解性阅读的全新体验。RUR 理解性阅读器结合文本挖掘和人工智能技术，核心思想如下： 给定一篇文本，比如一篇论文或一章书，首先用计算机计算文章中每个句子的重要性分数，然后按句子分数从高到低抽取一定比例的句子做成若干文本区域，使得第1块区域中的每个句子的分数比其他区域的句子分数高，第2块区域中的每个句子的分数比剩下的其他区域中的每个句子的分数高，如此类推将所有区域排序。每快区域中的句子仍按原文的句子顺序排列，这些句子也不一定是在原文中连续出现的。
实现 RUR需要快速、准确地计算每个句子的重要性分数。RUR Read的研发团队在这方面的研究成果在精度和速度方面世界领先。比如，我们用标注数据集SummBank做比较得出，我们的算法的精度超出每位专家的个体智能，并可与专家群体智能水平媲美。