Localized Search Implementation with Elasticsearch

Can we say Elasticsearch is great for localized search? Let’s do a check. It is an engine that gives you most of the standard search features out of the box. There are many ways to look for an optimal window to implement fast and indexed document search, scoring docs based on certain formulas, autocomplete search, context suggestion, localized text comparison based on analyzers and so on!

I am here to discuss about implementing a localized search for remote languages, regardless of being supported by analyzers in ES or not and how to get good results (for starters), if not the best.

I will use Node.js and ES as the technical stack. Let’s define some standard types for our index schema. I have three cases considered here:

  1. English Analyzer
  2. Hindi Analyzer (Comes tagged with ES. See: Language Analyzers)
  3. Standard Analyzer (Use if your language does not have an inbuilt analyzer in ES)


const standardTypes = {
'KEYWORD_ASCII': {
'type': 'keyword',
'ignore_above': 256
},
'STANDARD_TEXT': {
'type': 'text',
'analyzer': 'standard'
},
'ENGLISH_TEXT': {
'type': 'text',
'analyzer': 'english'
},
'HINDI_TEXT': {
'type': 'text',
'analyzer': 'hindi'
}
}
module.exports = standardTypes;

We need to define the schema in a way to support all the standard types. I have chosen three languages to display search. English, Hindi ( Indian native ), and Telugu (Regional South Indian Language with no default analyzer in ES).


'use strict';
…….
"descriptors": {
"properties": {
"english": {
"properties": {
"description": standardTypes.ENGLISH_TEXT,
"name": standardTypes.ENGLISH_TEXT
}
},
"hindi": {
"properties": {
"description": standardTypes.HINDI_TEXT,
"name": standardTypes.HINDI_TEXT
}
},
"telugu": {
"properties": {
"description": standardTypes.STANDARD_TEXT,
"name": standardTypes.STANDARD_TEXT
}
}
}
}
…..
module.exports = schema;

view raw

schema.js

hosted with ❤ by GitHub

We have Telugu under standard analyzer as it is based on the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 and works well for most languages. We can also use Simple Analyzer as it is a modified form of Standard Analyzer and divides text on characters which are not a letter.

Now, we have a schema defined. Next, you create an index with the schema and populate the index with related documents. I am not sharing actual documents which were used for my testing, but one can find text resources online to populate an index. For Node.js, one can use ES client for Node.js, or an easier way would be ES rest API.

There is a whole variety of search one can perform on a document having the above schema for all fields with custom analyzers. [ Full-Text Queries in ES ]

I was able to get great search results for English and Hindi, and search results for Telugu were not much below the bar. The ease with which one can create an almost real-time search engine is something unbelievable. I have not gone into many technical details of analyzers and how they function by combining the appropriate character filterstokenizer, and token filters. It is expected for a standard analyzer to be just acceptable with the results, of course, it is only for starters. An Elasticsearch user must implement full-fledged custom analyzer for a regional language to get more accurate results. Moreover, ES provides with few add-ons for Asian languages such as Korean, Chinese, etc.

So, we can conclude that Elasticsearch is indeed great to boost your product’s localization and accessibility in small time cost and high return value.

GSoC 2017: Charmap Integration

These awesome three months of summer spent developing for LibreOffice under Google Summer of Code, have filled me with great zeal and zest. A plethora of important additions was made to the software bundle under the project titled “Usability of Special Characters”, and these new features will be made available in the version 6.0 of LibreOffice (Release Notes for 6.0). Here is a glimpse of what the users will be receiving in the new update.

Note: Please zoom-in the web page or open the GIF’s in the new tab if the character grid is not correctly visible.

Screenshot from 2017-08-22 21-21-46.png

Special Characters in LibreOffice Master

 

‣ Search functionality via generic code point name

search2.gif

Glyph name properties have been introduced to LibreOffice using the API provided by International Components for Unicode (ICU). The program identifies glyphs according to their names provided by ICU and then, the search results are displayed. There’s a display label which is dedicated to glyph’s Unicode name.

‣ Inter-font dynamic glyph search

inter-font search.gif

As simple as it could be made, a user can now type the name of the glyph and scroll between fonts until the desired results are shown.

‣ Recently Used Characters and Favorite Characters

recent_special.png

‣ Toolbar Dropdown control for Quick Access!

In pursuance of providing quick access to the above Recent and Favorite character list, a toolbar dropdown control has been developed. It is supposed to replace the current toolbar button which opens the special character dialog in the currently circulated LibreOffice 5.3.

ToolbarDropdown.gif

The GIF below is an example of how easy a user can find the desired symbols and can pin it for quick access in future.

favorites.gif

‣ Context-menu and Mouse click controls for easier interaction

recent.gif

Link to the major patch submissions:

Glyph View and Recent Characters Control in Special Characters dialog https://cgit.freedesktop.org/libreoffice/core/commit/?id=710a39414569995bd5a8631a948c939dc73bcef9

Favourites feature in Special characters https://cgit.freedesktop.org/libreoffice/core/commit/?id=f9efee1f87262b0088c249b2c306fb53ca729b53

‣ Special Characters Toolbar Dropdown Control https://cgit.freedesktop.org/libreoffice/core/commit/?id=800ac37021e3f8859a52c5eebca261a5d3bc5a11

‣ Unicode Character Names Integration using ICU https://cgit.freedesktop.org/libreoffice/core/commit/?id=43d65d1ab81a278e1352f64def9ca63b9e7dfab9

‣ Search feature for Special Characters https://cgit.freedesktop.org/libreoffice/core/commit/?id=e74be9ad773c7769c5d8765bb2ac234967e420ec

I was mentored by Samuel Mehrbrodt, Heiko Tietze, and Thorsten Behrens in GSoC 2017. I would like to give my regards to the LibreOffice community which helped me through the deadlocks I faced during the project. It has been an awesome two-year journey with LibreOffice, and I hope it will remain the same in future and the open-source technologies will flourish with their full potential and thrive to its zenith.

Usability of Special Characters: GSoC 2017

Woah, Google Summer of Code with LibreOffice ( x2 ). This time, I’ll be working on improvement and rework of Special Characters feature in LibreOffice and adding some enhancements to it. I will be mentored by Samuel Mehrbrodt, Thorsten Behrens, and Heiko Tietze. I’ll encapsulate all the proposed changes with respect to the project in this blog.

The Idea

  • Create a way to quickly re-use recently-picked special characters, allowing the user to search in the whole character map, which has no filter to narrow down results.
  • Allow users to create their own ‘Special Characters’ subset (Individualization)
  • Sorting by last in, first out; items from the list of recently used characters are sorted to the beginning if selected.
  • Create a toolbar dropdown control to easily access recent symbols and the user-defined custom subset.
  • Have a preview along with the Unicode name.
  • Better UI for search (within font subsets) using Unicode name, hex and decimal code.
  • Different subsets within a font need a separation in the special character SvxShowCharSet custom widget.
spclchar

Finalized enhancements for the dialog

Proposal for the toolbar dropdown for quick access to favorites and recently used characters.

spclchar2

Design for the toolbar dropdown.

A lot of challenges need to be addressed while working on this project. It’s about time to play with Unicode data and custom-widgets.

For other queries and discussions, please comment or ping me (IRC nick: Akki) on libreoffice-dev / libreoffice-design channel on Freenode.

GSoC with LibreOffice: Work Product

Google Summer of Code 2016 is coming to an end. It was an awesome experience and I got to learn something new every time I switched my linux on. I was assigned a project before the start of GSoC with my mentors, Samuel Mehrbrodt and Yousuf Philips, but I managed to complete two projects in the given time frame.Here is the link to my GSoC Introduction blog. This blog post provides an overview of the projects I’ve contributed to LibreOffice during GSoC and it’s significance in the professional working environment.

List of latest Commits

Total commits count: 61 patches in master and 31 backport commits to LibreOffice 5.2.

LibreOffice gave me commit access soon after the GSoC coding period started. I also committed few patches of other developers.

I will now continue with an overview of my projects.

Project 1: Redesigning the Template Manager [Blog]

The above link provides description of all the functionalities which are there in the new design. The dialog has gone through a major rework solving the defects which were surfacing in the previous template manager for ex: visible empty folders.

Initial Commit: https://cgit.freedesktop.org/libreoffice/core/commit/?h=libreoffice-5-2&id=ca040d16d06fead95ad7ed8d10f5995fbade1219  New Template Manager

After the initial commit, I continued with my work and a series of major and minor commits were made to implement the functionalities. The commits can be found in the commit lists provided above. I’ll mention three of the important ones here.

After this project was completed, me and my mentors decided to take another project. It was Samuel’s idea to create an emoji toolbar control and we discussed the development process in detail.

Project 2: Emoji Toolbar Control [Blog]

The above blog contains all the information about the emoji control.

Initial Commit: https://cgit.freedesktop.org/libreoffice/core/commit/?id=72e6f08c692c0625db5ce377fb478a99660adb0d  GSoC Emoji Control

Adding Noto Emoji font to LO installation: https://cgit.freedesktop.org/libreoffice/core/commit/?id=ecb096841a1d7b4d468ba111df4ebafc13134c8e

Miscellaneous Tasks

and so on..

There was so much to learn during GSoC and I would like to thank the LibreOffice community for continuously helping and supporting me to complete the projects.

Thank you Google for such an awesome summer program. 👌

Emoji Toolbar Control 😎

The use of Emoji in text based softwares and editors are properly justified. LibreOffice was lacking this cool feature 😐. Me and my mentors, Samuel Mehrbrodt and Yousuf Philips, decided to work on Emoji integration in LibreOffice as part of my GSoC contributions.We planned to create a working control with proper emoji font support by the end of summer of code. This post provides an overview of the work done. The entire description of the idea can be found here Discussion on Emoji Integration.

User Interface

  • Control to be available in Writer, Impress and Draw, accessible via Standard Bar.
  • No UI idea proposed throughout the discussion. Hence, I was free to use any sort of widgets to facilitate the purpose.
  • Use of tabs to filter emojis based on categories
  • Emojis should appear as a grid
  • Easy insertion, unlike special character control which uses a modal dialog as of version 5.2
  • Font support (Removal of tofu char 🤓 )
  • Functionality should be present as a toolbar control, primarily in Standard bar
  • Proposed search filter (Only after the emoji details are translated and made locale-dependent)

 Backend, JSON Database and Font Support

  • Populate the control by parsing a JSON file which contains the data of all the emoji glyphs
  • Support most of the unicode glyph by packaging a emoji-specific font ‘Noto Emoji‘ by Google
  • Insert the unicode glyph to the editing pad formatted by the Noto font
  • Orcus JSON Parser to parse the json data file
  • Glade for UI designing and custom widget integration
  • Creation of custom widget to render font glyphs as a grid
  • Facilitate glyph insertion using InsertSymbol uno command.

Emoji Toolbar Control

After some pre-planned development and tricky debugging, I finally managed to get the toolbar control working. The beta version of toolbar control was merged and is ready to use, thanks to the help and guidance I got from my mentor and LO developers. Patch: GSoC Emoji Control: patch which makes all the above changes to LO.

emoji3a.png

Emoji Toolbar Control

Emojis gets inserted into the pad on clicking the respective emoji.

emoji4b.png

One should remember that unlike svg or png emoji, these emoji are glyphs of a particular unicode sequence. They hold all the properties of normal texts. Their colour and size can be changed just like any other character glyph.

emoji48.png

The enhancement still needs some love to become bug free. The emoji control is hidden in the standard bar by default. It can be activated by right-clicking on the standard bar and going to Visible Buttons > Emoji Control. Another way to do so is by using the Customize dialog. Go to Tools > Customize and navigate as shown in the picture below.

emoji5b.png

Activate Emoji control from customize dialog

The foundation for this enhancement has been built now. Further improvements have been proposed like:

  • Filtering of unsupported glyphs
  • Removal of duplicate glyphs (multicolor emoji are treated as duplicate glyph) as multicolor glyphs are not supported by any font
  • support for svg emoji, etc.

Cheers 🍻

Redesigning the Template Manager

Midterms for GSoC 2016 have passed and my first project Redesigning the Template Manager is reaching completion. Therefore, I have decided to encapsulate all the modifications and enhancements that I (along with my mentors, Samuel Mehrbrodt and Yousuf Philips) have made to the Template Manager and some other UI components, namely Start Center, Presentation Wizard, etc. The idea behind the project can be seen here: Introduction: GSoC 2016 with Libreoffice

Template Manager

  • All Templates view replacing the previous Folder view
  • Search, Application and Category filters replacing Tab design
rnotes1

New Template Manager

  • Removal of Save as Template mode (since LibreOffice 5.2)
  • Context Menu for non-browse focused entries. (Open, Edit, Set as Default, etc..)       rnotes2
  • Controls for browse functions such as Import, Export, Move and Online Templates Link
  • Marking of default templates for each application
    blog10.png
  • Dropdown control to create and remove categories, resetting the templates for specific applications and refresh the viewblog11
  • Inbuilt category selection dialog for importing and moving templatesblog12
  • Title and Category as tooltips when hovering over thumbnailsblog121.png
  • New ‘Selected + Hover’ state in thumbnail view
    • How to try it? Just select a template. Now hover over a selected template and an unselected template to see the difference.
  • Removal of remote files view (since LibreOffice 5.3)
    • dead and buggy code, created long time ago
    • hidden if experimental mode is not enabled, crashes frequently
    • no way to download previews for each remotely available template
    • LO has dedicated Remote Files Dialog since 5.1, which does the job much better

      blog17.png

      Remote Files Add Service

  • Help, Open and Close controls added as HIG recommends

Save as Template

Template Selection (Impress)

  • Tools > Options > LibreOffice Impress > General > Start with Template Selectionblog_sel2
  • Enable users to select a template when Impress starts
  • Easy means to disable the dialog on Impress startup
  • Remove Presentation Wizard to push forth the Selection dialog
  • Dialog is Enabled by default

    blog_sel.png

    Impress: Template Selection dialog

Template Menu

  • File > Templates > Open, Save as, Managetemplate_menu
  • Bug 61396 Possibility to edit a template not in Template Repositoryblog113.png

Start Center

  • All Templates view in Start Center with application filter
  • Context menu for templates in start center

    blog111.png

    Start Center Template Selection

Other fixed bugs

Overall, my first project went smoothly, thanks to the help and guidance provided by my mentors, Samuel Mehrbrodt and Yousuf Philips. I had a lot of time remaining in the GSoC and hence, I picked up another project, Emoji Toolbar Control. It’s a new idea mentioned here: Bug 100100: Emoji Toolbar Control. I will also follow it with a blog about the enhancements I made and the problems and issues associated with the new feature.

 

Introduction: GSoC 2016 with Libreoffice

This is my first blog related to my Google Summer of Code project with Libreoffice. I would be working on the Template Manager of Libreoffice to solve the basic software design problems which were quite surfacing in it’s previous instance. I have been assigned two mentors for my project, Mr. Yousuf (Jay) Philips and Mr. Samuel Mehrbrodt, who will guide me and review my ideas. This blog mainly focuses on keeping all the proposed changes under one roof and will be followed by a series of related blog posts about the actual changes made in LibreOffice.

The Template Manager will undergo a major rework and installation of new features. Few insights to the changes and additions are given below:

  • UI concept:

    1. Redesigning the Template Manager

      • New User Interface for better UX
        • Remove the feels of a file manager
      • Fuzzy Search and Filter controls for easier accessibilty
      • Use of context menus instead of toolbox controls for non-browse focused entries
      • Removal of tabs (Drop 90’s design)
      • No regression on previous functionalities:
        • Non-browse focused functions (Edit, Set As Default, Delete, etc)
        • Browse focused functions (Move, Export, Import)
      • blog25

        Template Manager mock-up

    2. Making Impress A UX Princess

      • Scrap off the ‘Presentation Wizard‘ from Impress module
        • Exclude Libreoffice 5.2
      • Modal dialog for template selection
        • Minimal version of ‘Templates‘ dialog
      • ‘On’ by default with easy means of disabling
        • Impress > Tools > Libreoffice Impress > General > Start with wizard
  • Challenges:

    1. Improve Save as Template workflow

      • Current workflow fails to use Template Manager efficiently
      • Plan to create a new ‘Save as Template’ dialog
    2. Better integration of Template Manager with Start Center

      • Improved accessibility of templates in Start Center
      • Reduce back-end to minimal code
      • Better thumbnail previews
    3. Integration of online templates

      • Not decided yet ( Not mentioned in GSoC project proposal )
      • The current LO site does not provide with API
      • New website (not launched) has a JSON API
    4. CMIS Integration

      • Currently an experimental feature
      • Tools > Options > LibreOffice > Advanced > Enable Experimental Features
      • Not sure about interoperability w.r.t. templates.

 

For other queries and discussions, please comment or ping me (IRC nick: Akki) on libreoffice-dev / libreoffice-design channel on Freenode.