WP Engine Managed WordPress Hosting SEMrush Earn up to $7500 for one sale!

OpenRefine, formerly Google Refine and before that Freebase Gridworks, is also known as Refine. It’s a free, incredibly powerful browser-based tool with similarities to spreadsheets, but with considerably more functionality.

OpenRefine runs locally on your computer (Windows, Mac OS X, Linux), although you can set it up to run remotely on a Web server, or use a paid hosted service offered by the founders of Gridworks.

What can Refine do? Well the list below is just for starters. If any of this means anything to you, then you’ll have a sense of how feature-rich OpenRefine is (although there’s much more not mentioned):

  1. Data import from multiple sources and file types
  2. Multiple file format export, including JSON, spreadsheets
  3. Template-based data export – ideal for custom text format output
  4. Project export – for duplication with tweaks
  5. Regular expressions, GREL (Google Regular Expression Language), BeautifulSoup
  6. Multi-column sorting, with reverse option and partial undo
  7. Facets – for multiple levels of grouping and slicing data
  8. ColumnĀ collapsing – for viewing convenience
  9. Reconciliation – for transforming one column of information into something else
  10. URL retrieval – for HTML page fetch, with custom delays for Web etiquette
  11. Bulk HTML parsing of a column with XPath
  12. Bulk JSON parsing of a column
  13. Split a column into multiple columns on several criteria
  14. Split a row into multiple rows
  15. Flag and star – for filtering rows
  16. Cell type conversions – text, boolean, numeric, date
  17. Data filtering with multiple criteria
  18. Data clustering on multiple criteria
  19. Data transformation
  20. Bulk editing
  21. Data massage – to “refine” dirty data into cleaner, more structured data
  22. Records – for grouping rows on a variable
  23. Blank down on column data – for grouping rows as records
  24. Fill-down on column data – reverse of blank-down
  25. Transaction history – for limitless undo operations
  26. Cross-join between Refine projects
  27. Row indexing
  28. Record indexing
  29. Column indexing – for complex multi-column manipulation
  30. Framework for Web crawling

An appropriate description of OpenRefine’s functionality might be “data exploration, massage and transformation.”

There is a learning curve, but Refine is the single most powerful tool in my data journalism / data science toolkit – aside from maybe R and R Studio. In fact, even over two years of regularly using OpenRefine (and four years since discovering it), I am still new features. There are a lot of features that you may never use, some features you might use occasionally, and others you might even use daily — depending on the kind of data work that you do. I use Refine nearly daily, for both client and personal projects.

I’ll be preparing a series of tutorials about using OpenRefine for general data manipulation and reporting, and how the workflow can be integrated with other data tools, including data visualization.

30 Reasons You Need Google/ OpenRefine for Data Journalism
Tagged on:     

Leave a Reply

Your email address will not be published. Required fields are marked *

Please note: banner ads may be affiliate links.

Fastest WordPress Hosting Namecheap.com