============================================================================
COLING 2014 Reviews for Submission #387
============================================================================

Title: Mining temporal footprints for Linked Data resources

Authors: Michele Filannino and Goran Nenadic
============================================================================
                            REVIEWER #1
============================================================================


---------------------------------------------------------------------------
Comments
---------------------------------------------------------------------------

The paper presents a method to extract the time span of existence for entities,
e.g. the lifespan of a person or the duration of an event, from an encyclopedic
description of the entity. The proposed method is based on extracting explicit
dates as well as temporal expressions from the text, followed by fitting a
Gaussian distribution to the extracted timestamps.

While I think the method has some merit, there are some serious shortcomings
and omissions in the paper:
 - The motivation of the work, namely the lack of infoboxes in Wikipedia from
which timespans can be extracted, is only partly valid. While it is true that a
large fraction of articles do not have an infobox, timespans can easily be
derived from Wikipedia categories, which are present for nearly every article.
This has been exploited by numerous works, e.g. [1], and specifically for
temporal facts/timespans by YAGO2 [2]. Especially the omission of YAGO2, which
explicitly focuses on temporal and spatial facts, stands in contrast to the
authors' claim that knowledge bases are mostly static. Similarly, Freebase has
a large number of temporal facts, including start and end dates for entities.
The authors need to compare the coverage of their approach to existing data,
showing that there is merit when applied to Wikipedia, or motivate the work
with an example that goes beyond Wikipedia.
 - The discussion with the related work, and the definition of appropriate
baselines is not thorough enough. Given that there is room for nearly one more
page of writing, I would expect a better discussion of related work.

There are other, minor things, which the authors could address:
 - The presented results look fine, however I would appreciate an
interpretation of the actual values of MDE - is 0.4 good or bad?
 - The paper gives a mathematical description of MDE, which is the target for
optimization of some hyper-parameters. However there is no further information
on what kind of data the parameters are tuned and how well this works.

The open issues, especially the missing discussion and omission of strongly
related work, should be adressed.

============================================================================
                            REVIEWER #2
============================================================================


---------------------------------------------------------------------------
Comments
---------------------------------------------------------------------------

The authors proposed a model to find temporal footprints, based on
Gaussian fitting. More specifically, they tried to find birth year and death
year for a person, by making use of the year mentions from the description on
Wikipedia.

This problem is quite interesting.
The authors attempt to demonstrate the performance of different approaches in
the experiments.

However, the proposed model is quite simple and is not easy to be applied to
other domains. For example, the lower and upper bounds are simply decided by
two parameters $\alpha$ and $beta$. The $\alpha$ parameter is used to control
the length of the period extracted. These two parameters need to be tuned for
every other domain and may need prior knowledge. Furthermore, the
best-performing model in the paper only consider the simplest date format,
using the error-prone regular expression extraction. It would be much more
difficult to also consider other types of date formats.

The claim that existing Linked Data resources ignore temporal period that makes
a fact true is not fair, since the temporal footprints extracted for person are
already included in existing Linked Data resources. In addition, the proposed
model is not necessarily related to Linked Data, since DBpedia is only used to
provide ground truth data for the people.

============================================================================
                            REVIEWER #3
============================================================================


---------------------------------------------------------------------------
Comments
---------------------------------------------------------------------------

This paper investigated the use of several approaches for extracting temporal
footprint from Wikipedia articles.

Strengths:
- The problem is interesting
- The correlation between the prediction performance and length of the textual
content is interesting.
- The presentation is nice.

Weaknesses:
- The proposed method is rather simple with limited novelty. Most of the
techniques are adapted from other papers.
- What is the justification of using Gaussian fitting? Are there evidences to
support that?
- The writeup has several typos and errors that need to be carefully revised.
E.g., in page 6 fig 4 is wrongly expressed as fig 6; in the caption of fig 5,
the "Figure 5" should be changed to "Figure 4", etc.


============================================================================
AHA! 2014 Reviews for Submission #7
============================================================================

Title: Mining temporal footprints from Wikipedia concepts

Authors: Michele Filannino and Goran Nenadic
============================================================================
                            REVIEWER #1
============================================================================


---------------------------------------------------------------------------
Reviewer's Scores
---------------------------------------------------------------------------

              Originality/Innovativeness: 4
                 Impact of Ideas/Results: 2
                   Meaningful Comparison: 3
                                 Clarity: 5
                                 Overall: Weak accept
                     Reviewer Confidence: 4


---------------------------------------------------------------------------
Comments
---------------------------------------------------------------------------

The paper presents the idea of detecting the temporal footprint of concepts in
encyclopedia text.
It is very well written and presents the methodology as well as a comparison
with other technologies on finding the correct birth and death date of people.
The authors first identify date tokens in the text using simple regular
expression as well as a temporal tagger.The resulting timestamps are fitted
with a Gaussian distribution which is shifted based on parameter estimation to
account for the fact that some dates might be mentioned more often than others.
Finally the lower an upper bounds are estimated from this distribution.
While this reads like an interesting problem, I wonder how generally applicable
the method is and if not by choosing Wikipedia as a target domain, this
produces the needed results, due to the specific characteristics.
Firstly the authors measure their performance for people that they already know
the correct birth and death dates – which eliminates the need for their
technology altogether. The paper mentions question and answer tasks as possible
use cases, but this is not further explored.
Some more remarks.
Secondly, I have the suspicion that the method would just work on the
encyclopedia text as it is usually written in chronological order with lots of
temporal references. How this would be applicable to other types of text is
unclear. There is also the assumption in the proposed method that the
distribution of the temporal tokens follows a normal distribution – I suspect
that this might be the case as again Wikipedia was used to test the method but
how general is that? I would have liked to see that distribution of tokens on
the text instead of the other presented charts.
Thirdly, there is also the problem of associating the temporal tokens with the
concept. Not every time token might actually belong to the concept introduced.
For example “Peter was born 1970. He founded the company X on 1.1.2000.”
There is a correlation as he needs to be alive to do that, but there might be a
misrepresentation. This might be irrelevant for the given task and actually
feed better into the prediction, but there is also the possibility that this
becomes more of a problem in longer texts which introduces more such
references, as observed by the authors.

============================================================================
                            REVIEWER #2
============================================================================


---------------------------------------------------------------------------
Reviewer's Scores
---------------------------------------------------------------------------

              Originality/Innovativeness: 3
                 Impact of Ideas/Results: 3
                   Meaningful Comparison: 4
                                 Clarity: 4
                                 Overall: Weak accept
                     Reviewer Confidence: 4


---------------------------------------------------------------------------
Comments
---------------------------------------------------------------------------

The authors introduced a problem of discovering the temporal footprints for
specific concepts from the text of their Wikipedia articles. To predict the
footprints, they proposed a pipeline of extracting date expressions, outlier
filtering and distribution estimation. The experimental results show a positive
impact from outlier filtering, an improvement from distribution estimation only
on long documents and a negative result when employing a sophisticated time
resolution system.

In the experiments, only person lifespans are tested. It's not clear how well
the same technique applies to other types of concepts like companies,
dynasties, etc. Perhaps the values of alpha and beta will need to be re-tuned
(e.g. a company might last longer than a person).

Tuning parameters using only 220 examples might lead to overfitting. It'd be
better to see if more training examples could help.

I'm not fully convinced that the proposed method is the best way to tackle the
problem. There are some alternative baselines. For example, one can use a
relation extractor to extract birth-date and death-date relations from the
text.

============================================================================
                            REVIEWER #3
============================================================================


---------------------------------------------------------------------------
Reviewer's Scores
---------------------------------------------------------------------------

              Originality/Innovativeness: 4
                 Impact of Ideas/Results: 3
                   Meaningful Comparison: 3
                                 Clarity: 5
                                 Overall: Accept
                     Reviewer Confidence: 5


---------------------------------------------------------------------------
Comments
---------------------------------------------------------------------------

The paper addresses the challenge of determining bounds to the temporal
interval for which an object exists. The approach taken is well-informed and
the results encouraging. Rather than directly addressing a top-level knowledge
discovery task, this paper addresses a critical aspect of any QA system: the
temporal status of entities used in possible responses. Overall, this is an
interesting piece of work and certainly deserves to be discussed at the
workshop.

The task is similar to other efforts, which should at least be noted.
Specifically:
- Ji, Grishman: Knowledge base population: Successful approaches and
challenges. ACL 2011 (The KBP task that year had a temporal subtask, which
included a large dataset formulated like this one, and an evaluation metric for
the problem in this paper's Appendix A)
- Talukdar, Wijaya, Mitchell: Coupled temporal scoping of relational facts.
WSDM 2012
- Rula, Palmonari, Ngonga Ngomo, Gerber, Lehmann, Buhmann: Hybrid Acquisition
of Temporal Scopes for RDF Data. ESWC 2014

A few questions remained after reading the paper:
- The evaluation metric has been well-considered, but as it stands, seems to
give different penalties in scores depending on the duration of the target
interval. Can the authors comment on this? Here's an example. If correct
interval A1 is a week long, and the response B1 is for a week-long interval but
two weeks too early, do we see the same score as if the correct interval A2 was
a year long and the response B2 is a year-long interval but two years too
early? Both these seem to be the same magnitude of error. Should they give the
same score difference? A sentence or two should clear this up.
- Is an entity's temporal footprint really just its birth to death? Many
entities seem to have significant impact outside of these bounds. For example,
Google was started before it was called Google or instituted as a company.
- There seems to be a non-trivial problem with longer documents. Figure 3
provides an excellent overview of the presented techniques' ability to handle
this problem. What's going on with the spike around 22k words? What other
techniques could help with longer docs?
- The approach is designed to handle sub-year granularity entities, but were
any evaluated?