I’m in the final months of my master’s, and if I don’t finish the thesis, I’m out. So this is me trying to get it done, and documenting the process as I go. My master’s program is in Data Science and Artificial Intelligence, and the first question was the one that sounds easiest but isn’t: what am I actually going to write about?
This post is about the part that comes before the writing. Choosing a topic, deciding what kind of project I wanted to take on, and drafting an outline that would carry me through the rest of the work. The next post covers the literature review workflow I landed on once the topic was fixed.
Choosing a topic
There are a few reasonable ways into a topic. My supervisor had a list of subjects they were interested in supervising, so that was the first place I looked. Looking at what your supervisor already wants to work on has a real practical benefit: they already know the literature, they can give you sharper feedback, and meetings are more useful. If you don’t have a strong opinion of your own, this is usually the path of least resistance.
If you do have your own idea, you can propose it. Whether it works depends on how closely your idea aligns with your supervisor’s expertise and how comfortable they are supervising something slightly outside their usual area. Worth asking, in any case.
If nothing in either direction pulls at you, a good move is to spend a few hours scanning recent issues of journals in your field. Not reading the papers, just reading titles and abstracts. Patterns show up quickly: what people are arguing about, what datasets keep coming up, and where the open problems are. A topic that three recent papers are circling is a topic with momentum, which makes the literature review easier later.
I started with one of my supervisor’s suggestions. It was a deep learning problem on an open-source dataset, and it was genuinely interesting. But I work full-time, and I realized I wouldn’t have the hours to do a topic justice that sat completely outside my day job. So I switched. I looked at my own field, thought about what kinds of data I already had access to and already understood, and rebuilt the topic around that.
That detour gave me the one opinion I actually hold strongly about topic selection for a data science thesis: if academia is not your plan, pick a topic where you are confident about the data. Confident means you know the dataset exists, you know you can get it, and you know roughly what’s in it. Nothing derails a thesis faster than spending month two of a six-month project still trying to scrape, license, or build a dataset from scratch. Collecting original data is a thesis in itself; if you have time for that, great. If you don’t, choose a project where the dataset is either public, already in your hands, or accessible through your supervisor’s connections.
This is an unromantic take. An academically pure answer would be “follow your curiosity, the data will sort itself out.” I don’t think that’s wrong. I just think it’s a luxury that depends on what kind of time you have and what you want the thesis to do for you. If your goal is to finish a solid piece of work in the window you have, data availability is one of the most important filters, and it’s worth applying early.

Drafting the outline
Once the topic was settled, I opened a blank document and wrote an outline. Just section headings, before any literature review, before any analysis. The point was to get the shape of the thing onto the page so I could see it.
A typical structure for a data science thesis isn’t surprising: Introduction, Methods, Results and Discussion, Conclusion, References. Inside each section, there are sub-headings, and that’s where the actual thinking happens. In the Introduction, for example, I knew I’d need a general motivation, a more specific description of the problem, and a statement of the research objectives. Under Methods, I knew I’d need data description, preprocessing, modeling, and evaluation, but I deliberately left the inner structure of Methods loose.
That last part matters. The Methods section can’t be fully outlined upfront because its final shape depends on what the analysis actually turns into. You might plan to use one family of models and end up using another. Writing a detailed Methods outline before doing the analysis is a way of writing a document you’ll have to rewrite. It’s simpler to leave it as a placeholder and let the section grow as the work grows.
The Introduction is different. You can outline it in reasonable detail before doing much else, because its job is to explore and frame, and those things come from the literature more than from your own results. That also means the Introduction outline evolves the most during the literature review. As I read, I moved sub-sections around, collapsed some, and added others. By the time I finished reading, the outline looked noticeably different from the first draft, and that was the point.
The value of the outline isn’t that it’s correct. It’s that it exists. Having a document with headings turns “write a thesis” from an abstract horror into a list of smaller writing tasks, each with a defined scope. It also makes the literature review directional: you’re not reading to know everything, you’re reading to fill in specific sub-sections. That constraint is what keeps a literature review from becoming a six-month black hole.
This is just my own experience. I’m sure there are more structured and professional ways to approach all of this. I’m sharing what I actually did.
Where this goes next
The next post is about the literature review itself, the tools I used, how I picked an anchor paper, how I moved from Google search to Connected Papers to actually writing paragraphs with references.

Leave a Reply