Welcome to my blog. I document my adventures in travel, style, and food. Hope you have a nice stay!

Automated UN Document Summaries

Automated UN Document Summaries

How might we make the United Nations's historical document archive, and other future UN documentation, more accessible?


The UN's Official Document System Search (ODS), built by the Technical Services Team in UN-OICT, is an open door to the entire collection of UN documents all the way back to its founding. In many ways, this archive's trail is a record of the world's history for the past decades. How might we make that record, and other future UN documentation, more accessible?

Our team collaborated with machine intelligence research company Fast Forward Labs (FFL) to improve keyword and topic extraction on official UN documents. Employing machine learning techniques, FFL were able to enrich the set of tags and classifiers applicable to UN documents to cover a greater range within the corpus. This approach offers flexibility across multiple languages and can be more sophisticated than regular expression recognition. 

The first experiment, lead by FFL's Data Scientist Micha Gorelick, resulted in a system that reorders the sentences in each document according to their potential to act as summaries. These higher-potential sentences are automatically brought to the top, rearranging the document into a more efficient read through a process called "extraction-based summarization."

For example, the 2007 paper "Arrangements for the Secretary-General’s high-level event entitled 'The future in our hands: addressing the leadership challenge of climate change'” can be summarized, in part, as the following:

The reports issued by the Intergovernmental Panel on Climate Change in 2007 show clearly that the warming of the Earth’s climate system is unequivocal and attributable to human activities. The high-level event will build on progress made to date in the framework of the Convention process and will take into account recent initiatives by other organs of the United Nations, notably the thematic debate that the President of the General Assembly convened in New York on 31 July and 1 and 2 August 2007.

The Secretary-General has encouraged the participation of all Heads of State or Government in the high-level event and has issued invitations to that effect. The Deputy Secretary-General and the three Special Envoys on Climate Change of the Secretary-General will serve as facilitators of one thematic plenary each, on behalf of the Secretary-General.

Let’s Compare

For comparison, the original:

And a sortable / rearrangeable version using using the described process. Sorting by “score” will rearrange the document in its summary-optimized form:


Distribution of strong summary sentences across the entire ODS collection.

Distribution of strong summary sentences across the entire ODS collection.

The tool was run across the thousands of existing documents in the UN ODS. This chart that shows where in the documents the most important sentences tend to occur across the entire collection. The x-axis shows percentage into the document. The y-axis shows the number of sentences across the entire corpus that appear as the top 5 sentences in an article.

Many occur in the beginning of the document, but there's quite a wide distribution. This would indicate that there is limited structural consistency from document-to-document across the collection.


In addition to improving search and providing more efficient access to existing documents, this document might also help in analyzing long-form text-based data in projects like the OCHA Libya Monitoring Tool, described in the Unite Newsletter here.


Watch Fast Forward Labs' webinar below to learn more about the technology, process, and potential applications for this:

Libya Humanitarian Monitoring Dashboard

Libya Humanitarian Monitoring Dashboard

Emerging Technology Workshops

Emerging Technology Workshops