Transcript:
In the last video, I talked about the mechanics of writing, and now I want to talk a little bit more about the actual craft of writing, and this is not a writing course. I can’t do justice to the many many pages that have been written about what it means to write. Well, and so here are two books that I really like style towards clarity and grace and on writing well. They talked a lot about very important stuff, even to computer scientists things like structuring your sentence to go from the familiar to the unknown to define terms well, to avoid vague words to use topic sentences for every paragraph to remove fluff and obfuscation from sentences. These are really great books. I urge you to read them on a regular basis. Because every time you read them, you discover something new and become a better writer, but ha. I cannot do justice to the ideas in these books. They really stand on their own. Please do check them out instead. I’m going to talk about something. A little bit More specific to writing papers and computational, linguistics and natural language processing the kinds of audiences that you’ll be writing for and these books are more general about writing nonfiction and writing scientific papers is a little bit different because the audiences are more nuanced and so. I want to focus on that, but again that’s not the only important thing about writing. Check out these books, lots of great lessons in there, but one of the things that these books will tell you is that being a good writer involves paying attention to your readers and scientific writing has a lot of different readers. There are the reviewers who are deciding whether to accept your paper. There are struggling students trying to replicate your paper. There are journalists trying to write a article under a deadline. And there’s our friends and family from high school or your hometown checking out your webpage and seeing what is Sally up to once you went off to university and a common failure mode of papers is focusing on one aspect of the audience and ignoring everybody else. Oftentimes, you just write for the reviewers, and and that can be a problem and, for example, if you focus on the experts, you’ll confuse struggling students or the general public, and if you focus on the general public, then you can omit the details necessary for someone to replicate your work, and that destroys the scientific value of your paper and prevent someone from thoroughly reviewing your work, so what I want to do is talk about each of these audiences and general and what you can do to make your paper work for each of those audiences, so I first want to talk about the brilliant but lazy or overworked reviewer peer review publications are the coin of the realm in academia. You can’t get promoted without them. You can’t get a job without them. And for many fields in computer science, especially artificial intelligence, things like natural language processing, there are too few reviewers to do a good job of reviewing. All of these papers and many people are overwhelmed by the amount of reviewing they have to do. There are a lot of people trying to enter the field from industry. Graduate programs are expanding and all of those students are trying to produce research and those papers have to get reviewed and as a result, the too few reviewers review too many papers and a cynical way of looking at this is that these reviewers have to find a reason to reject a paper as quickly as possible. And so you defensively as a writer need to make it hard for them to reject your paper on some trivial or unimportant basis, you need to make some do their jobs and actually engage with your paper on a scientific level, so you need to write for these readers who are very smart and have been in the field a long time and can look for flaws in your paper and find them very efficiently. So you need to make it easy for them to skim through your paper or realize. Hey, maybe this isn’t totally worthless. Maybe we should actually accept this paper, but then they have to actually read the paper. So you need to get them to that point. These reviewers are unlikely to read your paper from start to finish again. They’re both lazy and smart enough to know the things that you’re going to say and like the introduction and the conclusion, so why should they read it? So you need to organize your paper to make it easy for them to randomly access any part of your paper and understand what’s going on, so they should be able to skim through the section titles of your paper and get a sense of what’s going on, so don’t just call your section’s introduction methods results conclusion that doesn’t. Tell them anything that can be in any paper. You need to give expressive titles, regularization improves efficiency and you, you can preface that with something like results. : regularization improves efficiency. Tell them what’s going on so that they can get a sense of what your paper is about. Similarly, for every figure, give them a take away. Don’t just put a plot in and say accuracy versus time. Explain what’s going on. If there’s a take away from the figure that they should get don’t assume that they’ll spend five minutes looking at the figure and realize what they need to take away, say something like the baseline performs better in the first 10 iterations because it does a worse job of exploring the space and don’t allow them to say in their review. The baseline does better in the first 10 iterations, so the method proposed in this paper is worthless. Simar it needs to be clear what it is that you’re actually contributing in this paper. So if it’s unclear what your contribution is what your algorithm is, you need to in each of the figures. Make sure that the reviewer knows what it is that they should be comparing to whatever else don’t just give a long list of algorithms where they might not be able to decipher it similarly. Make sure that your algorithms are named well. If you have many variants, make sure that it comes across which variants do what, and don’t just give them inscrutable acronyms that only you know the meaning of another thing you need to do to appease these varying smart. But time constraint, researchers is make sure that you have adequate coverage of the literature. Make sure that you connect not just to the latest neural network models. But also the probabilistic models of the last decade and connected to the fundamental questions of computational linguistics. Don’t just have a myopic view of the literature. Make sure that you connect it in the ways that it needs to be connected, for example. If your reviewer wrote the seminal paper introducing the task that you’re fancy, neural network is trying to solve, and you don’t cite that original task definition. That’s probably a bad omen for your paper, The next audience that I want to talk about is the not overworked, but maybe not all that experienced researcher think about, like a second or third year grad student who is reading your paper after it’s been published or maybe even reviewing it, that’s becoming more and more common these days, and they have a lot of time to spend on your paper. They don’t know the field as well as professors who have been here for a couple of decades, so they don’t know the broad expanse of the literature, but they know the details and the pratik. Your problem very well, and you’ve probably been in this kind of position. You’ve had to implement some existing work or try to understand some existing work, and you looked the paper and you got confused, so you need to put yourself in the shoes of a struggling researcher who just wants to understand the details of what you did, and this may be for reproducibility. They’re coming along in a couple of years and trying to get your tests train. Correct, trying to do the same pre-processing that you did and don’t just leave. These details to source code source code is great, but not everybody can use the same programming language that you do And in 20 years that source code may be completely useless or have vanished and only the paper remains. The paper still needs to be useful for addressing these details and for reviewing most of the time, your source code is not in a pretty state at the time you submit the paper and a lot of the technical questions that these sorts of reviewers will have won’t get answered by the source code. They have to look at the paper and they’ll be poring over every sentence every paragraph, so make sure that you have the details in there that would allow someone to actually go in and replicate pretty closely. What, you’ll actually do in your paper for this? It’s important, not just to write things that sound impressive, but give details and so don’t just say that we use gradient optimization. Talk about the package that you’re using. If you’re using high court, just say that you’re using high torch and perhaps give some details of say the learning rate things like that to make it more concrete and omitting. These details doesn’t make you appear more impressive and including these details makes your paper a better contribution to science. One of the things that makes it difficult for you as a writer. Even though you may be a junior grad student to write for other junior grad students, is that in the course of doing the research you’ve forgotten what you learned along the way and when you started this research project? You were confused about some things you figured it out. It took you some time to figure it out, but you know that cold now, but you need to remember the things that you were confused about. When you started the research project and someone reading the paper for the first time hasn’t been in this field will likely have the same confusions that you did. And so you need to explain that in the same way. That would have helped you when you were starting the project. So think about if you had a time machine to go back and talk to the person who started the research who doesn’t know all the things that you know? Now, what would you tell them to make their life easier and to remove those confusions as efficiently as possible speaking of confusion? I want to talk about the last audience that you’ll likely be writing for members of the media or the general public who may not have a good idea of what’s going on in your field and might get the wrong idea from reading your paper as written for experts, and there are two reasons that it’s important to keep this audience in mind. The first reason is that you don’t want your research to be misconstrued. You don’t want a reporter to look at your report from a project or a published paper and write some misleading story about it. That either makes it seem like what you were doing was stupid or trivial, or on the other hand, make it seem like computers are going to destroy humanity when, in all likelihood, you probably have some real scientific contribution, but it’s not going to, you know, probability change the world anytime soon, so to make that clear, don’t over claim, make sure that your claims are grounded in reality, and if you connect it to a real application, explain the actual ways that it would change that application. So if you’re talking about dialog processing, don’t say that it will allow Alexa to understand your needs and your motivations, saying that in situations where context matters to decide the user’s intent, it will improve a personal assistant’s ability to detect the intent. Be precise, Don’t use words that are in the General Lexicon without explaining the difference between the General Lexicon And how you’re using them specifically so for example. If you talk about regret to find regret in the context of reinforcement, learning and online learning talk about is the difference between the optimal reward and what your algorithm learns and don’t let reporters assume that this is now a computer feeling regret that they missed out on some emotional event. The other reason that you should try to keep this audience in mind. When you’re writing your papers is that sometimes you won’t have expert reviewers? You’ll have reviewers who come from a neighboring field or just don’t have the training that they need to review the paper. This is an ideal situation, but it happens, and if you can write a paper that is engaging to someone in the general populace, not a scientist in your exact area, you can make it engaging. You can explain what’s going on, and you can make it seem like you actually achieve something worthwhile. Even if they’re not in your area and they can get excited about it that makes for a much stronger paper, and if you can simultaneously make a professor who’s been in the field for 30 years happy, you can give the details to lessen the confusion of a grad student, just entering the field and you can make a reporter excited if you can do all those three things and appeal to those three audiences. You have a really strong paper, but how do you know if you’ve actually done this? And that’s the hardest thing, and and you can internally simulate these audiences to some degree, but the only true way to know if you’ve done, this is to get other people to read it And as a result, this means if you have to finish your paper ahead of time and not just write until midnight of the deadline. So one way that we do this in our research group is we have paper clinics and this helps simulate overworked, knowledgeable reviewers. We we have our lab with grad students and professors. We pass the papers around and people only have 20 minutes to read the paper and you see what they took away and oftentimes, it come away with the wrong impression, and these are the kinds of people who will be reviewing your paper, and if you can do this process with your co-workers or your colleagues that can simulate what a reviewer would actually do, and you can also turn to your friends and family so people in different fields of computer science. Have them read what you’re writing and see if they understand what’s going on. And if they do, that’s a really good sign that you’re writing. Well, it may not mean that you have the greatest research, but if you can explain it well, that counts for a lot and a well-written paper is more likely to get good reviews, which we’ll talk about next.