-------------------------------------------------------------------------------
FROM JBI on 13/03/2017  (1st round)                    ------------------------
-------------------------------------------------------------------------------

Reviewer #1: This work described an NLP shared task challenge focusing on automatic de-identification. It extends previous 2014 i2b2/UTHealth challenge by introducing new corpus (psychiatric intake records) and new "sight unseen" track to test system generalizability. Overall, this is a high impact research that promotes de-id research at national and international scope. The manuscript is well-written and ready to be published at JBI with minor edits.
My minor comments are listed below:
*       There are so many tables, consider merging them to promote readability.  Eg: Table 2 and 3, Table 4 and 5
*       Why need to surrogate PHI? Some de-id systems basing on real geographical database or knowledge base might not performed well on surrogated data. Did you test impact of surrogated data on de-id performance? It needs to be supporting with good reasons.
*       Consider deleting some Tables or embedding them in language (E.g, Table 7, Table 10, Table 13)
*       Consider revise this sentence "The top four teams achieved F1 scores of over 64%, which provides a good starting point for development on the new data". I don't think 64% with poor recall and precision is a good base to add on.
*       In the conclusion, please highlight better what does this study contribute to de-id research and science. Clearly, saying de-id was not solved, was not novel.


Reviewer #2: De-identification is an important process of protecting personal information in clinical records, including psychiatric intake records. The competition "2016 CEGS N-GRID Shared Tasks Track 1" ("2016 shared task" for short) could be treated as a continuation of 2014 de-identification shared task. The results obtained by the teams that participate in the competition may pose several new problems, which could inspire other researchers to conduct many further researches. It is important to introduce the data construction processes and the general exploration results of the data to other researchers, and it is also necessary to produce a summary of the competition results. In this paper, authors described the content and provided many informative results. However, the reviewer also wants to make two very important suggestions to further improve the paper:
(1) Descriptions about the methods, features and the conditions obtaining the best result by each team can be concluded in a table. It would save a lot of space for other more important content, which has not introduced in this paper, e.g. a comprehensive and detailed error analysis of the best results obtained by each team. The reviewer thinks that such content should be provided in this paper, because this type of comparisons can only be provided by the sponsor, who has collected all the results from the competition teams, and these comparison results are even more important than the simple summary of the competition results, which can be obtained from the papers published in the special issue or other resources.
(2) More detailed and deeper result analysis should be given in this paper. For example, in the final results, some teams obtained higher precision results but lower recalls, and other teams' results are just the opposite. Why does this problem occur? Further, different methods have different ability to recognize different categories, why? There are a lot of more important information can be found in the results collected by the sponsor. The reviewer thinks that this important information, which can only be provided by the sponsor, should be reported in this paper, but not just list the result obtained by each team with some rough analysis.

-------------------------------------------------------------------------------
FROM JBI on 18/05/2017  (2nd round)                    ------------------------
-------------------------------------------------------------------------------


Reviewer #1: The authors addressed my comments in this revision.


Reviewer #2: The results reported in the paper have been modified according to the reviewer's suggestions. However, the revised version still contains some mistakes. CEGS N-GRID Shared Tasks are well-known, in order to further improve the quality of the paper and achieve the published standard of JBI, the reviewer suggests authors carefully modify the mistakes left in the paper.
Minor revision:
1. In Section 3.1, paragraph 2, it is better to delete the word "also" in the first sentence to make the paper clear and coherent, because in the next paragraph this word is also used in the first sentence.
2. Figure 2 is missing in the "Revised Manuscript".
3. The word "were" in Section 5.1, paragraph 2, line 6, should not be in bold form.
4. It should be "Table 6" but "Table 9" in Section 5.2.1, paragraph 1, line 3.
5. The "7" in the first line of paragraph 1 in Section 5.2.2 should be deleted.
6. The table indexes from Table 8 are all wrong!
7. The figures from Figure 3 in the "Revised Manuscript" should be carefully redone.
8. The order of Figure 5 and Figure 4 is wrong; authors need to carefully modify them. TIPS: DO NOT FORGET TO MODIFY THE CORRESPONDING DESCRIPTIONS IN THE MAIN BODY.
9. The format of reference part is wrong, and some author information is missing.