Machine learning research as a product. How easy is it to use your work?

Dmytro Mishkin
3 min readJun 4, 2019

--

The best research is when you are answering questions that bother you and it happens to be in line with your company/grant proposal goals. However, the single core message depending on the way it is written and accompanied artifacts can influence different people, shared across different communities, live long or die fast.

How to be more useful for people and gain more citations?

General things

  • Is it easy to grasp the key message fast and present it on the reading group? If no, you are in trouble.
  • Upload to arXiv. You might be against uploading during the reviewing process, but there is no reason not to upload after paper acceptance. And the main reason for uploading is arXiv feed and arxiv-sanity. These are the main ways of how people discover new papers. If you don`t upload there, lots of people never know about your work.
arxiv-sanity One of the main ways of discovering new ML papers now
  • Add experiments on as various datasets as possible. In the short term, you will gather citations by people who don`t care at all about your method, but who need to fill-in the benchmark table with recent/best results.
  • Add interesting side-observations. You never know what people need to write and therefore to back their claim with some reference.
  • If you are brave enough, list the things which don`t work for you. It may decrease (if unlucky) your acceptance chance, but you will save followers a lot of time by this simple list. Moreover, some of them will think: “well, I know how to fix it”. They will write a paper and cite you.
  • That said, all the additional stuff should not make your main story cumbersome. The easiest way is to move the extra to the appendix

If you want to be useful for other researchers in ML/CV field

  • Publish source code on github. This will encourage to build on top of your method and/or improve its parts. Important: the code, which is hard to compile is almost useless, as no code at all. The best is to provide python pip package.
  • Contrary to the previous point, even the worst code in the world is better than nothing — it may help people who are trying to reimplement your approach.
  • Add your paper to https://paperswithcode.com/
  • Instead of just writing the script “reproduce_all_experiments.sh”, write also a script with an example of how to use your method/data on people`s own data.
screenshot from https://paperswithcode.com/

If you want to be useful for applied researchers in other fields

Create end-user-task-examples

Basically, this is the “product” part. The most of people including researchers are not interested in your paper or your domain. But they all have their problems and tasks.

If your method can be easily integrated with some other libraries to produce some meaningful result for end-user, be it doctor, historian or modeler — just do it.

E.g. if you are working with local features, provide a tool to compare two images and output list of correspondences + visualization, like ASIFT or MODS.

ASIFT web-demo for image matching http://demo.ipol.im/demo/my_affine_sift/

Provide Windows binaries/installation files

Lots of researches from non-ML field work on Windows. Give them an opportunity to try your tool!

If it is worth the game, create the video tutorial for users. COLMAP is a great example of both

COLMAP has great tutorials.

These all are quite simple and obvious things, but you will be surprised how many researchers neglect them. Good luck!

--

--

Dmytro Mishkin

Computer Vision researcher and consultant. Co-founder of Ukrainian Research group “Szkocka”.