============================================================================ COLING 2014 Reviews for Submission #387 ============================================================================ Title: Mining temporal footprints for Linked Data resources Authors: Michele Filannino and Goran Nenadic ============================================================================ REVIEWER #1 ============================================================================ --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- The paper presents a method to extract the time span of existence for entities, e.g. the lifespan of a person or the duration of an event, from an encyclopedic description of the entity. The proposed method is based on extracting explicit dates as well as temporal expressions from the text, followed by fitting a Gaussian distribution to the extracted timestamps. While I think the method has some merit, there are some serious shortcomings and omissions in the paper: - The motivation of the work, namely the lack of infoboxes in Wikipedia from which timespans can be extracted, is only partly valid. While it is true that a large fraction of articles do not have an infobox, timespans can easily be derived from Wikipedia categories, which are present for nearly every article. This has been exploited by numerous works, e.g. [1], and specifically for temporal facts/timespans by YAGO2 [2]. Especially the omission of YAGO2, which explicitly focuses on temporal and spatial facts, stands in contrast to the authors' claim that knowledge bases are mostly static. Similarly, Freebase has a large number of temporal facts, including start and end dates for entities. The authors need to compare the coverage of their approach to existing data, showing that there is merit when applied to Wikipedia, or motivate the work with an example that goes beyond Wikipedia. - The discussion with the related work, and the definition of appropriate baselines is not thorough enough. Given that there is room for nearly one more page of writing, I would expect a better discussion of related work. There are other, minor things, which the authors could address: - The presented results look fine, however I would appreciate an interpretation of the actual values of MDE - is 0.4 good or bad? - The paper gives a mathematical description of MDE, which is the target for optimization of some hyper-parameters. However there is no further information on what kind of data the parameters are tuned and how well this works. The open issues, especially the missing discussion and omission of strongly related work, should be adressed. ============================================================================ REVIEWER #2 ============================================================================ --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- The authors proposed a model to find temporal footprints, based on Gaussian fitting. More specifically, they tried to find birth year and death year for a person, by making use of the year mentions from the description on Wikipedia. This problem is quite interesting. The authors attempt to demonstrate the performance of different approaches in the experiments. However, the proposed model is quite simple and is not easy to be applied to other domains. For example, the lower and upper bounds are simply decided by two parameters $\alpha$ and $beta$. The $\alpha$ parameter is used to control the length of the period extracted. These two parameters need to be tuned for every other domain and may need prior knowledge. Furthermore, the best-performing model in the paper only consider the simplest date format, using the error-prone regular expression extraction. It would be much more difficult to also consider other types of date formats. The claim that existing Linked Data resources ignore temporal period that makes a fact true is not fair, since the temporal footprints extracted for person are already included in existing Linked Data resources. In addition, the proposed model is not necessarily related to Linked Data, since DBpedia is only used to provide ground truth data for the people. ============================================================================ REVIEWER #3 ============================================================================ --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- This paper investigated the use of several approaches for extracting temporal footprint from Wikipedia articles. Strengths: - The problem is interesting - The correlation between the prediction performance and length of the textual content is interesting. - The presentation is nice. Weaknesses: - The proposed method is rather simple with limited novelty. Most of the techniques are adapted from other papers. - What is the justification of using Gaussian fitting? Are there evidences to support that? - The writeup has several typos and errors that need to be carefully revised. E.g., in page 6 fig 4 is wrongly expressed as fig 6; in the caption of fig 5, the "Figure 5" should be changed to "Figure 4", etc. ============================================================================ AHA! 2014 Reviews for Submission #7 ============================================================================ Title: Mining temporal footprints from Wikipedia concepts Authors: Michele Filannino and Goran Nenadic ============================================================================ REVIEWER #1 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Originality/Innovativeness: 4 Impact of Ideas/Results: 2 Meaningful Comparison: 3 Clarity: 5 Overall: Weak accept Reviewer Confidence: 4 --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- The paper presents the idea of detecting the temporal footprint of concepts in encyclopedia text. It is very well written and presents the methodology as well as a comparison with other technologies on finding the correct birth and death date of people. The authors first identify date tokens in the text using simple regular expression as well as a temporal tagger.The resulting timestamps are fitted with a Gaussian distribution which is shifted based on parameter estimation to account for the fact that some dates might be mentioned more often than others. Finally the lower an upper bounds are estimated from this distribution. While this reads like an interesting problem, I wonder how generally applicable the method is and if not by choosing Wikipedia as a target domain, this produces the needed results, due to the specific characteristics. Firstly the authors measure their performance for people that they already know the correct birth and death dates – which eliminates the need for their technology altogether. The paper mentions question and answer tasks as possible use cases, but this is not further explored. Some more remarks. Secondly, I have the suspicion that the method would just work on the encyclopedia text as it is usually written in chronological order with lots of temporal references. How this would be applicable to other types of text is unclear. There is also the assumption in the proposed method that the distribution of the temporal tokens follows a normal distribution – I suspect that this might be the case as again Wikipedia was used to test the method but how general is that? I would have liked to see that distribution of tokens on the text instead of the other presented charts. Thirdly, there is also the problem of associating the temporal tokens with the concept. Not every time token might actually belong to the concept introduced. For example “Peter was born 1970. He founded the company X on 1.1.2000.” There is a correlation as he needs to be alive to do that, but there might be a misrepresentation. This might be irrelevant for the given task and actually feed better into the prediction, but there is also the possibility that this becomes more of a problem in longer texts which introduces more such references, as observed by the authors. ============================================================================ REVIEWER #2 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Originality/Innovativeness: 3 Impact of Ideas/Results: 3 Meaningful Comparison: 4 Clarity: 4 Overall: Weak accept Reviewer Confidence: 4 --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- The authors introduced a problem of discovering the temporal footprints for specific concepts from the text of their Wikipedia articles. To predict the footprints, they proposed a pipeline of extracting date expressions, outlier filtering and distribution estimation. The experimental results show a positive impact from outlier filtering, an improvement from distribution estimation only on long documents and a negative result when employing a sophisticated time resolution system. In the experiments, only person lifespans are tested. It's not clear how well the same technique applies to other types of concepts like companies, dynasties, etc. Perhaps the values of alpha and beta will need to be re-tuned (e.g. a company might last longer than a person). Tuning parameters using only 220 examples might lead to overfitting. It'd be better to see if more training examples could help. I'm not fully convinced that the proposed method is the best way to tackle the problem. There are some alternative baselines. For example, one can use a relation extractor to extract birth-date and death-date relations from the text. ============================================================================ REVIEWER #3 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Originality/Innovativeness: 4 Impact of Ideas/Results: 3 Meaningful Comparison: 3 Clarity: 5 Overall: Accept Reviewer Confidence: 5 --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- The paper addresses the challenge of determining bounds to the temporal interval for which an object exists. The approach taken is well-informed and the results encouraging. Rather than directly addressing a top-level knowledge discovery task, this paper addresses a critical aspect of any QA system: the temporal status of entities used in possible responses. Overall, this is an interesting piece of work and certainly deserves to be discussed at the workshop. The task is similar to other efforts, which should at least be noted. Specifically: - Ji, Grishman: Knowledge base population: Successful approaches and challenges. ACL 2011 (The KBP task that year had a temporal subtask, which included a large dataset formulated like this one, and an evaluation metric for the problem in this paper's Appendix A) - Talukdar, Wijaya, Mitchell: Coupled temporal scoping of relational facts. WSDM 2012 - Rula, Palmonari, Ngonga Ngomo, Gerber, Lehmann, Buhmann: Hybrid Acquisition of Temporal Scopes for RDF Data. ESWC 2014 A few questions remained after reading the paper: - The evaluation metric has been well-considered, but as it stands, seems to give different penalties in scores depending on the duration of the target interval. Can the authors comment on this? Here's an example. If correct interval A1 is a week long, and the response B1 is for a week-long interval but two weeks too early, do we see the same score as if the correct interval A2 was a year long and the response B2 is a year-long interval but two years too early? Both these seem to be the same magnitude of error. Should they give the same score difference? A sentence or two should clear this up. - Is an entity's temporal footprint really just its birth to death? Many entities seem to have significant impact outside of these bounds. For example, Google was started before it was called Google or instituted as a company. - There seems to be a non-trivial problem with longer documents. Figure 3 provides an excellent overview of the presented techniques' ability to handle this problem. What's going on with the spike around 22k words? What other techniques could help with longer docs? - The approach is designed to handle sub-year granularity entities, but were any evaluated?